Methodology - Hubway Stations Availability

1. How are the numbers of displaced and lost rides estimated?

These numbers are estimated using a ranking-based choice model, which is a statistical model of how an arriving customer chooses where to pick up a bike from among a list of ranked preferences. By fitting historical data (of weekdays between 2016/06/01 and 2016/08/31) to the model, we can then use it to predict how a rider may behave in the event that his/her preferred stations have run out of bikes. This allows us to estimate, retrospectively, how many rides might have been displaced or lost within the 3-month period due to station stock-outs.

2. What behavioral assumptions are made in the model?

We assume that each individual rider's decision—whether to choose a pick-up station, if available, or to give up otherwise—is completely determined by his/her own ranking of station choices. This ranking may reflect convenience of access, closeness to the intended destination, and other exogenous factors such as weather.

For example, an MIT graduate student commuting to the main campus from his dorm on Albany St. may rank two nearby choices in order of walking distance: [ MIT Pacific St., MIT Vassar St. ]; a BU student living in Kenmore Square going to Cambridge might have a different ranking in mind: [ Kenmore Sq., Beacon St. and Mass. Ave. ]. If the first-choice station has no bike available, the student will either choose the next available station on the list (the pick-up is displaced), or give up using the service if there is no alternative (the pick-up is lost).

In the model, we assume that each rider will only consider at most two alternatives within 1km from his first choice. Also, that these behaviors are homogenous throughout the time period of study. Based on this setup, our algorithm then infers a probabilistic distribution over all possible station choice rankings that best explains the data.

3. Can a rider’s drop-off (bike-returning) behaviors be similarly modeled?

Absolutely. Our statistical model can be adapted to drop-off as well. However, we focus on bike pick-ups for this visualization because drop-off behaviors are less complex—most bikes on loan are eventually returned, so there is no equivalent of lost pick-ups in this case.

4. What insights can be drawn from this visualization?

We find several interesting angles that may be worth exploring further:

Interplay between stock-out rates, pick-up displacements and losses: High stock-out rates don't always entail high pick-up losses. If a station is surrounded by many available alternatives, then pick-ups are more likely to be displaced rather than lost (Example: Franklin St. / Arch St. in the evening). On the other hand, isolated stations can have higher lost pick-ups despite having lower stock-out rates (Example: ID Building West in the evening).
Network impact of high-availability stations: Strategically choosing a small set of stations to be restocked quickly during the peak hours may help absorb displacements from nearby stocked out stations, thereby reducing overall pick-up losses. For example, our analysis reveals that South Station, a highly available station with very low stock-out rates, absorbs a significant number of displaced rides from its surrounding stations during the evening (5pm–7pm), which may explain why their pick-up losses are low despite having high stock-out rates.
Change of displacement patterns over time: Displacement patterns may reflect shifts in broader commuting behaviors. For example, in the morning between 7am–9am at Lafayette Sq. at Mass. Ave., most pick-ups displaced from the station end up at MIT Pacific St.—this may be explained by the fact that many MIT students and staff are traveling in the direction towards the main campus. However, in the evening between 4pm–6pm, most displacements end up at Central Sq. station instead, which may be due to outbound commutes from the campus.

Try exploring the tool yourself and tell us what you find!

5. More questions?

Contact Chong Yang Goh and Chiwei Yan.