BorderBot: Using AI to Predict Border Wait Times

Every day, tens of thousands of people cross the US-Mexico border at San Ysidro. Some days the wait is twenty minutes. Some days it's three hours. The difference between those two scenarios changes your entire day, whether you're late for work, miss an appointment, or waste your afternoon sitting in traffic.

CBP (Customs and Border Protection) publishes current wait times, but they're backward-looking. They tell you what the wait is now, not what it will be in an hour. For someone deciding when to cross, the current wait is useful but insufficient. What you really need is a prediction: if I leave at 7 AM, how long will I wait?

BorderBot was my attempt to answer that question using machine learning.

The Data Problem

Before building any model, I needed data. Specifically, I needed historical wait times at multiple crossing points, timestamped and structured.

CBP publishes current wait times on their website. But they don't provide an API, and they don't archive historical data. The current wait times are displayed, updated periodically, and then gone.

So I built a scraper. Every fifteen minutes, a script pulled the current wait times from CBP's website and stored them in a database. Over weeks, this accumulated into a dataset of historical wait times with enough temporal coverage to identify patterns.

This data collection process taught me the first principle of building data products: you often have to create the dataset before you can build the product. Unlike feature products where the data comes from users, data products frequently require bootstrapping the dataset through collection, scraping, or generation.

The scraping was fragile. CBP's website structure changed periodically. The formatting wasn't consistent. Sometimes data was missing. Sometimes the values were clearly wrong (zero-minute waits during peak hours). Data cleaning consumed more time than model building.

The Patterns

With a few months of data, patterns emerged clearly:

Day-of-week patterns. Monday mornings had longer waits (commuters returning from weekend trips to Mexico). Friday afternoons had longer waits (people heading south for the weekend). Wednesday was consistently the lightest day.

Time-of-day patterns. Morning peak (6-9 AM) for northbound commuters. Afternoon peak (3-6 PM) for southbound travelers. Late night was empty.

Holiday effects. US holidays created spikes in southbound traffic. Mexican holidays created spikes in northbound traffic. The overlap of holidays from both countries created extreme congestion.

Seasonal patterns. Summer had higher baseline wait times (tourism). Winter had lower baselines but sharper morning peaks (consistent commuter patterns without tourist noise).

Weather effects. Rain increased wait times, not because of the crossing itself, but because rain slowed traffic on the approach roads, creating bottlenecks before the port of entry.

These patterns were discoverable through basic time-series analysis. The challenge was turning them into predictions that accounted for multiple overlapping factors.

The Model

I used a straightforward approach: historical averaging with time-series adjustment.

For each fifteen-minute slot, the model calculated the average wait time for that slot, weighted by recency and adjusted for day-of-week, holiday status, and seasonal factors. This wasn't deep learning. It wasn't a neural network. It was statistics applied consistently to a well-structured dataset.

The accuracy was surprisingly good. For predictions one to two hours out, the model was within fifteen minutes of actual wait times about 80% of the time. For same-day predictions (morning prediction for afternoon crossing), accuracy dropped but was still more useful than no prediction.

The model's simplicity was a feature, not a limitation. A complex model would have been harder to debug, harder to explain, and harder to maintain. The simple model was transparent: I could explain to users exactly why it predicted a ninety-minute wait (Monday morning, holiday weekend, historically heavy).

What I Learned About AI Products

BorderBot was my first product that used any form of machine learning. The lessons transferred directly to every AI product I've built since.

Data quality trumps model complexity. A simple model on clean, comprehensive data outperforms a complex model on messy, incomplete data. I spent 80% of my time on data collection and cleaning, and 20% on the model. That ratio is correct.

Users don't care about the model. Nobody asked whether BorderBot used linear regression, random forests, or neural networks. They asked whether the predictions were accurate. The model is an implementation detail. The prediction is the product.

Confidence matters as much as accuracy. A prediction of "90 minutes, plus or minus 15 minutes" is more useful than a prediction of "90 minutes" because it sets expectations correctly. Users who know the range can make better decisions than users who have a point estimate.

Real-time data changes expectations. Once you give users predictions, they expect updates. "You said 90 minutes at 7 AM, but now it's 8 AM. What's the update?" The product needed to continuously update predictions as new data arrived, which meant the scraper and the model needed to run continuously, not just at prediction time.

The feedback loop is essential. Users who crossed the border could report actual wait times, which fed back into the model. This feedback loop improved accuracy over time and created a moat: the more users reported, the better the predictions got, which attracted more users, who reported more data.

The Product Decisions

Beyond the technical model, BorderBot required product decisions that shaped its usefulness:

Which crossing points to cover. The Tijuana-San Diego corridor has three main crossings: San Ysidro, Otay Mesa, and Cross Border Xpress. Each has different traffic patterns and user demographics. I started with San Ysidro (highest volume) and added others based on demand.

How to present predictions. A wall of numbers isn't useful. I displayed predictions as a simple timeline: "Now: 45 min | In 1 hour: 60 min | In 2 hours: 30 min." Users could scan the timeline and choose their departure time at a glance.

Push notifications. The highest-value feature was alerting users when predicted wait times dropped below their threshold. "Wait time at San Ysidro is predicted to drop below 30 minutes at 2 PM." This turned the product from a lookup tool into a planning tool.

Historical context. Showing users the historical average for the current time slot helped them calibrate expectations. "Current prediction: 90 minutes. Historical average for Monday 8 AM: 75 minutes." The comparison told users whether today was unusual or typical.

The Scaling Challenge

BorderBot worked well for the Tijuana-San Diego corridor because I lived there and understood the patterns. Scaling it to other border crossings would have required local data collection, local pattern validation, and local user feedback, none of which could be done remotely.

This is the scaling challenge of data products: the data that makes them valuable is local, specific, and hard to collect at a distance. A general-purpose border wait time predictor sounds scalable. In practice, each crossing point is its own micro-market with unique patterns that require dedicated data collection and validation.

I chose depth over breadth, making the Tijuana-San Diego predictions as accurate as possible rather than expanding to crossings I didn't understand. This decision limited growth but maintained quality. For a product that lives or dies on prediction accuracy, quality was the right priority.

The Bridge to AI Products

BorderBot was a small product with a small user base. But it was my proof of concept for a bigger idea: AI products that solve domain-specific problems using domain-specific data.

The same pattern (collect domain data, find patterns, build predictions, serve users through a simple interface) applies to every AI product I've built since. Aviation Infinity uses patterns in exam performance to guide study paths. The domain changes, but the approach stays the same.

The domain changes. The data changes. The model complexity changes. But the fundamental approach (domain expertise plus data plus AI equals a product that generic tools can't replicate) is the same.

BorderBot taught me that approach. Everything since has been scaling it.