Python for Transit: Speed by Bus Segment in a map from GTFS

Dive deeper into gtfs_functions Python package

Santiago Toso
Analytics Vidhya

--

Update March 2023!!

This package has been updated in March 2023. This article reflects the usage of the package’s latest version.

Introduction

In this article, we will see how to get scheduled mean speed by bus segment from a GTFS using the Python package gtfs_functions. You can find the repository and official documentation on GitHub.

If you are looking for an extensive explanation of the package, I recommend you first read this introduction. Here, we are going to directly dive into the specific use case of getting stop frequencies in a map.

Friendly reminder: please help me with a clap (or many!) when you finish reading if you find this article helpful.

Package installation and GTFS parsing

To install the package and parse the GTFS run the code below. For the article, I downloaded the GTFS from SFMTA (San Francisco, CA).

# In your terminal run
pip install gtfs_functions

# Or in a notebook (or similar)
!pip install gtfs_functions

# Import package
from gtfs_functions import Feed, map_gdf

feed = Feed("SFMTA.zip", time_windows=[0, 6, 9, 15, 19, 22, 24])

routes = feed.routes
trips = feed.trips
stops = feed.stops
stop_times = feed.stop_times
shapes = feed.shapes

Cut the shapes into segments

Sometimes, looking at the variables at the stop or line-level is not the best solution, and we need to go at the segment level. We want to know what is going on between stop A and stop B and how it is different from what is going on between stop C and stop D.

In order to be able to aggregate information at the segment level, we first need to cut the long shapes of each route in segments that go from stop to stop.

That is exactly what the function cut_gtfs does. It takes 3 arguments from the parsed GTFS:

segments_gdf = feed.segments

The output shows:

GeoDataFrame output for the function cut_gtfs().

Which is:

  • route_id of the segment
  • direction_id of the segments as comes in the GTFS
  • stop_sequence of the starting stop of the segment as it comes from the GTFS
  • start_stop_name as it comes from the GTFS
  • end_stop_name as it comes from the GTFS
  • start_stop_id as it comes from the GTFS
  • end_stop_id as it comes from the GTFS
  • segment_id as a concatenation of the start_stop_id and end_stop_id
  • shape_id for that segment as it comes from the GTFS
  • geometry as a LineString
  • distance_m that represents the length of the segment in meters. This will be useful to calculate the speeds later.

Having the segments is not the output in itself, but just a middle step we have to take to finally aggregate variables at the segment level. Let’s see how to do that in the next sections.

Calculate segments speeds

The GTFS gives us time information for each trip and stop. Now that we also have the distance in meters for each segment, it would be a trivial thing to calculate the speed between two stops for each trip. We also are able to calculate the number of trips that take place in that segment for each time window. All this information can be used to calculate the weighted average speed per route, segment, direction, and time of day.

That is exactly what the function speeds_from_gtfs does and it takes 4 arguments:

  • routes from step 1
  • stop_times from step 1
  • segments_gdf calculated in step 4
  • cutoffs as defined in step 2
speeds = feed.avg_speeds

The output for one specific segments, direction, and time of day shows:

The output has some self-explanatory columns so I will explain only the ones related to speeds:

  • speed_kmh: average speed in kilometers per hour for that route, segment, direction, and time of day
  • max_kmh: maximum average daily speed in kilometers per hour for that route, segment, and direction.
  • speed_mph: average speed in miles per hour for that route, segment, direction, and time of day
  • max_mph: maximum average daily speed in miles per hour for that route, segment, and direction.

Note that in the example above the chose segment 3114–3144 appears four times: one for each of the routes that serve that segment and a fourth time for the route “All lines”. This route is created by the function and it aggregates the weighted average speed in that segment taking into account all the routes that stop in its starting and ending stop.

Also, notice that the aggregated value for “All lines” takes into account the three segments, ignoring the direction the lines had in the GTFS. This makes sense since the segment always starts and ends in the same stops, even if the assigned direction is different in the GTFS

The route “All lines” is created by the function itself and it aggregates the weighted average speed in that specific segment taking into account all the routes stop in its starting and ending stop.

Show results on a map

If you are looking to visualize data at the segment level for all lines I recommend you go with something more powerful than the map_gdf() that we saw in previous articles like kepler.gl (AKA my favorite data viz library). For example, to check the scheduled speeds per segment:

You will need to manually style the colors and filters but you will have complete control over the visual. Or you can always learn to do it programmatically (which I haven’t yet).

Did you find this article helpful? Please let me know leaving a few claps!!

Acknowledgments & References

Even if this is not a corporate package, some members of Via’s Data Science NYC team collaborated on the last update of the package. A special shout out to Mattijs De Paepe who considerably improved the segment-cutting function and Tobias Bartsch who implemented pattern calculation.

In terms of relying heavily on other packages, map_gdf() is just a folium wrapper so much of the merit goes to its creators.

--

--