The future of planning methods? Big data, machine learning and AI

Matt Bhagat-Conway

What’s all the hype?

  • In recent years, computational power and new data sources have become widespread
  • There’s been a lot of hype about using this in planning, as well as just about every other field
A comic showing someone arriving with a computer, saying they can solve some field's problems with algorithms, and then later admitting that the problems are hard

© xkcd

Big data

  • Big data means different things to different people
  • A lot of folks talk about three aspects of big data
    • Volume: it’s big
    • Velocity: it accumulates quickly
    • Variety: it is not uniform
  • In social science and planning, it more often means something along the lines of “bigger than we’re used to”
    • “Too big to open in Excel”

Novel data sources

  • I prefer to think of most things people call “big data” in planning as “novel data sources”
  • These might include
    • Sensor data
    • Smartphone data
    • Connected vehicle data
    • Social media data
    • Crowdsourced data

Advantages of novel data sources

  • Often very timely (maybe e.g. from last month, or even last week)
  • Extensive—often orders of magnitude more observations than we could get otherwise

Limitations of novel data sources

  • These novel data sources often make no guarantees about representativeness
    • For instance, a common source of bicycle planning data is Strava
    • But who uses Strava?
  • Novel data sources often have a lot of observations, but relatively little detail about each
  • It is hard to do quality control on novel data sources
  • The data are often costly compared to e.g. Census data, but fairly inexpensive compared to collecting your own data

Machine learning

  • We make a lot of restrictive assumptions when we do regression (linearity, lack of interactions, etc.)
  • As data and computation have become more available, machine learning has developed as a set of techniques to relax those assumptions
  • Machine learning is heavily focused on prediction

Advantages of machine learning

  • Many machine learning models don’t make any parametric assumptions about the outcome
    • i.e. no assumption of linearity, automatic interaction terms, etc.
  • It can predict really well

Disadvantages of machine learning

  • Interpretation is much more difficult than with other models
    • Often these models are a “black box”
    • In planning, interpretation is often what we care about
  • Hard to be sure if the model is picking up on theoretically-justified relationships or other features of the data
  • Because of how flexible machine learning models are, overfitting is much more of a concern than it is with regression

Other advantages to machine learning: data types

Applications of machine learning in planning

  • Best used when prediction really does matter
  • Often used for imputation (filling in missing data)
  • Some newer causal inference tecniques rely on machine learning as well
  • Self driving cars?

AI

  • It’s still too early to see how ChatGPT and other AI tools affect planning
  • I think AI is going to be a big help to researchers dealing with text data
    • Data extraction, qualitative coding, etc

AI on AI

  • I asked ChatGPT how ChatGPT would affect urban planning
  • I didn’t buy most of what it said, but one thing that was interesting was scenario planning
  • AI tools could be used to develop, evaluate, and visualize scenarios

AI and visualization - 2023

The intersection in front of the Durham County Main Library, a tangle of high-speed roadways

Replaced with a pedestrian plaza

This took less than an hour with Stable Diffusion

AI and visualization - today

The intersection in front of the Durham County Main Library, a tangle of high-speed roadways

Replaced with a pedestrian plaza

This took a few minutes with Copilot

References

Creative Commons License
This work by Matthew Bhagat-Conway is licensed under a Creative Commons Attribution 4.0 International License.