This site may not work as intended in Internet Explorer 8 and below. If you experience problems viewing the graphs, you might try a different browser.

Bay Area Bike Share Trip Clustering

This represents a clustering of bikeshare trips based on what the start and end stations are near, and the relationships between them. For instance, a high score on Jobs (10 min) indicates that there are more jobs within 10 minutes of the start station than the end station. The clustering attempts to find patterns in the trips, by grouping similar trips together. The hypothesis is that there are several different types of trips that bikeshare serves. The pink bars represent what percentage of trips in that cluster were made on the weekends and by casual users (users with a one- or three-day pass, as opposed to an annual membership), respectively. The jobs and population within 10 minutes variables represent the ratios of the number of jobs and the number of residents, respectively, within 10 minutes walking distance of the start and end stations. The 30 and 60 minute variables include transit as well. The bike stations within 30 minutes variable uses bicycling as the mode (as that is how other bike stations would be reached) and 30 minutes as the time cutoff because trips longer than 30 minutes incur overage charges.

Clusters 1 and 2 are clearly related, as are clusters 3 and 4. Cluster 2 is a mirror-image of cluster 1. This is easy to explain; the two clusters represent round trips. Since the scores represent ratios, return trips will have reversed accessibility ratios. The number of trips in clusters 1 and 2 are also very similar, reinforcing the hypothesis that the clusters represent round trips. This argument is not as strong for clusters 3 and 4, but they do appear to be related in a similar way.

The trips in clusters 1 and 2 appear to be commute trips; they have much higher accessibility to jobs at one end than the other. The trips in cluster 3 and 4 represent other types of trips. Higher percentages of the trips in these clusters (especially cluster 4) are made on weekends and by casual users. These trips appear to be driven more by housing, so perhaps they represent shopping trips or homebound trips from transit stations. More than likely, clusters 3 and 4 contain a mix of trips, since they do not match up as nicely as clusters 1 and 2. Adding additional accessibility ratios, such as accessibility to transit stations or retail opportunities, would likely allow these clusters to split further.

This analysis suggests that there are distinct types of trips and that they are taken by different types of users. This information could be useful in planning and operation of bikeshare systems. For example, knowing that subscribers make more trips from low to high job accessibility areas could suggest what the effects will be on the balance of casual users and subscribers of a particular station configuration. This information could also be used when building models to predict how the system will be used. This visualization is descriptive; it describes how the system is used, but is not predictive. However, the trends shown could be integrated in a predictive model using other tools.

Details on the statistics and techniques used can be found in the blog post.

CC BY-NC-SA 4.0 Matthew Wigginton Conway, 2014. Created with data from Bay Area Bike Share, SFMTA, AC Transit, BART, Caltrain, SamTrans, VTA, the US Census Bureau, and OpenStreetMap using R, OpenTripPlanner and D3. Source code is available on GitHub.

Fork me on GitHub