The future of planning methods? Big data, machine learning and AI
What’s all the hype?
- In recent years, computational power and new data sources have become widespread
- There’s been a lot of hype about using this in planning, as well as just about every other field
![A comic showing someone arriving with a computer, saying they can solve some field's problems with algorithms, and then later admitting that the problems are hard]()
© xkcd
Big data
- Big data means different things to different people
- A lot of folks talk about three aspects of big data
- Volume: it’s big
- Velocity: it accumulates quickly
- Variety: it is not uniform
- In social science and planning, it more often means something along the lines of “bigger than we’re used to”
- “Too big to open in Excel”
Novel data sources
- I prefer to think of most things people call “big data” in planning as “novel data sources”
- These might include
- Sensor data
- Smartphone data
- Connected vehicle data
- Social media data
- Crowdsourced data
Advantages of novel data sources
- Often very timely (maybe e.g. from last month, or even last week)
- Extensive—often orders of magnitude more observations than we could get otherwise
Limitations of novel data sources
- These novel data sources often make no guarantees about representativeness
- For instance, a common source of bicycle planning data is Strava
- But who uses Strava?
- Novel data sources often have a lot of observations, but relatively little detail about each
- It is hard to do quality control on novel data sources
- The data are often costly compared to e.g. Census data, but fairly inexpensive compared to collecting your own data
Machine learning
- We make a lot of restrictive assumptions when we do regression (linearity, lack of interactions, etc.)
- As data and computation have become more available, machine learning has developed as a set of techniques to relax those assumptions
- Machine learning is heavily focused on prediction
Advantages of machine learning
- Many machine learning models don’t make any parametric assumptions about the outcome
- i.e. no assumption of linearity, automatic interaction terms, etc.
- It can predict really well
Disadvantages of machine learning
- Interpretation is much more difficult than with other models
- Often these models are a “black box”
- In planning, interpretation is often what we care about
- Hard to be sure if the model is picking up on theoretically-justified relationships or other features of the data
- Because of how flexible machine learning models are, overfitting is much more of a concern than it is with regression
Other advantages to machine learning: data types
- Data types are more varied—for instance,
Applications of machine learning in planning
- Best used when prediction really does matter
- Often used for imputation (filling in missing data)
- Some newer causal inference tecniques rely on machine learning as well
- Self driving cars?
AI
- It’s still too early to see how ChatGPT and other AI tools affect planning
- I think AI is going to be a big help to researchers dealing with text data
- Data extraction, qualitative coding, etc
AI on AI
- I asked ChatGPT how ChatGPT would affect urban planning
- I didn’t buy most of what it said, but one thing that was interesting was scenario planning
- AI tools could be used to develop, evaluate, and visualize scenarios
AI and visualization - 2023
This took less than an hour with Stable Diffusion
AI and visualization - today
This took a few minutes with Copilot