Every computer vision project starts the same way: a demo that works perfectly in the lab. Good lighting, clean backgrounds, cooperative subjects. The model hits 98% accuracy. Everyone celebrates.
Then you deploy it in the real world and everything breaks.
We’ve shipped CV systems across retail stores, public buses, and industrial settings. Here’s what we’ve learned about the gap between demo and production — and how to close it.
The Demo-to-Production Gap
A CV model trained on curated data behaves very differently when it encounters:
- Lighting variation: A model that works under fluorescent lights fails at sunset, in shadow, or under mixed lighting. Retail stores have spotlights on some products and dark corners elsewhere. Buses go from bright sunshine to tunnel darkness in seconds.
- Camera quality: Your training data was collected on a high-res camera. The deployment uses 720p CCTV from 2018 with compression artifacts and a dirty lens.
- Edge cases at scale: A 99% accurate model sounds great until you realize that at 1000 events per day, you’re getting 10 wrong answers daily. In passenger counting, that’s 10 fare disputes. In quality inspection, that’s 10 defective products shipped.
- Environmental factors: Dust on lenses, vibration, heat throttling the compute board, network dropouts mid-inference, power cuts that corrupt the model file on disk.
What Actually Matters in Production CV
1. Train on Ugly Data
Your training dataset should look like your worst deployment day, not your best. We deliberately include:
- Blurry frames from camera shake
- Overexposed and underexposed images
- Partially occluded subjects
- Unusual angles and perspectives
- Frames with compression artifacts
If the model hasn’t seen it in training, it will fail on it in production.
2. Design for Failure
Every component will fail. The question is how gracefully.
- Camera goes offline? The system should detect it, log it, and resume when it returns — not crash and require a manual restart.
- Model inference hangs? A watchdog process should kill and restart it.
- Storage fills up? Oldest data should be pruned automatically, not silently stop recording.
- Network drops? Data should queue locally and sync when connectivity returns.
We’ve built all of these recovery mechanisms into our systems because we’ve encountered every one of these failures in the field.
3. Build the Annotation Pipeline First
Most teams treat annotation as a one-time task before training. In reality, it’s an ongoing operation.
Every deployment generates new edge cases. You need a pipeline that:
- Flags low-confidence predictions for human review
- Makes it easy for annotators to correct mistakes
- Feeds corrections back into the training set
- Triggers retraining when the correction set is large enough
The annotation pipeline is as important as the model itself.
4. Measure Everything in Production
Lab metrics (precision, recall, F1) are necessary but not sufficient. In production, you also need:
- Latency per frame — is inference keeping up with the camera’s frame rate?
- Drift detection — is accuracy degrading over time as conditions change?
- Hardware health — CPU temperature, memory usage, disk space, network throughput
- Business metrics — are the downstream decisions based on CV output actually improving outcomes?
If you’re not measuring these, you’re flying blind.
5. Keep the Human in the Loop
Fully autonomous CV is the goal, but for most real-world applications, you need a human verification step — at least initially.
In our retail analytics system, store managers can flag incorrect counts. In our passenger counting system, operators can review video clips of disputed boarding events. These corrections feed back into the model, creating a virtuous cycle of improvement.
The Boring Parts Matter Most
The glamorous part of CV is the model — the architecture, the training, the accuracy benchmarks. But in production, the model is maybe 20% of the system.
The other 80% is:
- Data pipelines and storage
- Edge device management and monitoring
- OTA model updates without downtime
- Error recovery and self-healing
- API design for downstream consumers
- Annotation tooling and retraining workflows
These aren’t exciting topics, but they’re what separates a demo from a product.
Conclusion
If you’re building a CV system, plan for production from day one. The model is the easy part. The hard part is making it work reliably, at scale, in conditions you haven’t anticipated, with graceful failure handling and continuous improvement.
That’s the engineering we do at Thoht Delta. Not just models — complete systems that see.
Building a CV system that needs to work in the real world? Let’s talk.