One of the first decisions in any computer vision project is where the inference will run. Cloud? Edge device? Which edge device? The answer depends on your specific constraints — and getting it wrong is expensive to fix later.
We’ve deployed CV systems on NVIDIA Jetson (Nano and NX), tested Google Coral, evaluated the Raspberry Pi 5 + Hailo AI HAT combination, and run cloud inference pipelines. Here’s a practical comparison based on real deployment experience, not spec sheets.
The Options
Cloud Inference
How it works: Video is streamed (or clips are uploaded) to a cloud server running GPU instances. Models run on powerful hardware, results are returned via API.
When it works well:
- You have reliable, high-bandwidth internet at the deployment site
- Latency of 1-5 seconds per inference is acceptable
- You want to iterate quickly on models without touching deployed hardware
- Your deployment scale is small (a few cameras)
When it fails:
- Connectivity is unreliable (transit, remote locations, some retail stores)
- Real-time processing is required (counting people through a door needs frame-by-frame speed)
- Data volume is high (streaming video from 100+ cameras gets expensive fast)
- Privacy regulations restrict video leaving the premises
Cost model: Pay per inference or per GPU-hour. Cheap to start, expensive at scale.
NVIDIA Jetson (Nano / NX / Orin)
How it works: A small, power-efficient GPU compute board runs at the deployment site. Models are optimized with TensorRT for maximum performance on the available hardware.
Jetson Nano (~$300-400 production):
- 128 CUDA cores, 4GB RAM
- Good for: 1-2 camera streams, basic detection and tracking
- Limitations: Struggles with complex models or multiple simultaneous streams
Jetson NX (~$500-800 production):
- 384 CUDA cores, 8GB RAM
- Good for: 2-4 camera streams, more complex models, re-identification
- Our primary deployment platform for retail and transit
Jetson Orin (~$1,800+):
- Up to 2048 CUDA cores, 32GB RAM
- Good for: 8+ camera streams, multiple models running simultaneously
- Overkill for most single-location deployments, but ideal for multi-camera hubs
When it works well:
- You need real-time, frame-by-frame processing
- Connectivity is unreliable or absent
- Privacy requirements mean video can’t leave the site
- You’re deploying at scale (the per-unit cost beats cloud at ~20+ devices)
When it struggles:
- You need to change models frequently (requires OTA update infrastructure)
- The environment is extremely harsh (thermal management is critical)
- Power supply is unreliable (needs graceful shutdown handling)
Raspberry Pi 5 + Hailo AI HAT
How it works: A Raspberry Pi 5 paired with the Hailo-8L (13 TOPS) or Hailo-8 (26 TOPS) AI accelerator HAT. The Pi handles general compute and camera input, while the Hailo chip runs neural network inference. Models are compiled through Hailo’s Dataflow Compiler to run on their custom architecture.
Raspberry Pi 5 + Hailo-8L (~$95-140 total):
- 13 TOPS AI performance
- Good for: Single camera, basic detection and classification
- The Pi’s camera ecosystem (CSI cameras, libcamera) is mature and well-documented
Raspberry Pi 5 + Hailo-8 (~$180-260 total):
- 26 TOPS AI performance
- Good for: 1-2 camera streams, more complex detection and tracking models
- Competitive with Jetson Nano on many benchmarks at a lower price point
When it works well:
- Budget is tight — the Pi + Hailo combo is significantly cheaper than any Jetson
- Your model architecture is supported by Hailo’s compiler (most standard architectures are: YOLO, SSD, ResNet, EfficientNet, etc.)
- You need a mature Linux ecosystem with broad community support
- Power consumption matters — the full stack draws under 15W
- You want flexible I/O — the Pi’s GPIO, USB, and CSI interfaces are well-suited for integrating sensors, relays, or display outputs alongside the CV pipeline
When it struggles:
- Model compatibility: Hailo’s compiler supports a wide range of standard architectures, but very custom or cutting-edge models may need architecture adjustments. Not as flexible as Jetson’s CUDA/TensorRT path where virtually anything that runs on a GPU can be optimized
- Multi-stream processing: The Pi’s CPU becomes the bottleneck when decoding multiple camera streams simultaneously. Jetson’s hardware video decoder handles this better
- Ecosystem maturity for production: The Hailo SDK is solid but newer than Jetson’s ecosystem. Less community knowledge for debugging production issues. Jetson has years of battle-tested deployment tooling (DeepStream, JetPack, container support)
- Tracking and re-identification: The Hailo chip excels at single-model inference but the Pi’s CPU limits how much post-processing (multi-object tracking, re-ID matching) you can run alongside detection
Cost model: Lowest upfront hardware cost of any serious AI edge platform. The Pi 5 ($60-70) + Hailo-8L HAT ($35-60) gets you into AI inference for under $140 — roughly a third of a Jetson Nano setup.
Our take: The Pi + Hailo combination is the most interesting new option for cost-sensitive deployments. For simple detection use cases (counting, basic classification, presence detection), it’s hard to beat on price-performance. But for multi-camera retail analytics or transit counting where you need robust tracking, re-identification, and multiple model inference, the Jetson NX still earns its higher price.
Google Coral
How it works: A small TPU (Tensor Processing Unit) accelerator, either as a USB dongle or a dev board. Runs TensorFlow Lite models optimized for the Edge TPU.
When it works well:
- Simple classification or detection tasks
- Ultra-low power requirements
- Very cost-sensitive deployments (~$60-95)
When it struggles:
- Complex models don’t fit well on the Edge TPU
- Limited to TensorFlow Lite — no PyTorch, no custom ops
- Performance drops sharply with models that aren’t specifically optimized for it
- Ecosystem is smaller and less mature than both Jetson and Hailo
- Google’s commitment to the Coral product line has been inconsistent — stock availability and long-term support are concerns
Our Recommendation by Use Case
| Use Case | Best Option | Why |
|---|---|---|
| Retail analytics (1-4 cameras) | Jetson NX | Reliable, handles all our models, proven in deployment |
| Transit passenger counting | Jetson Nano/NX | Must work offline, needs robust tracking |
| Single-camera detection (budget) | Pi 5 + Hailo-8 | Best price-performance for simpler pipelines |
| Prototype / proof of concept | Cloud | Fastest to iterate, no hardware logistics |
| Simple counting (low accuracy OK) | Pi 5 + Hailo-8L | Cheapest serious AI platform |
| Multi-store centralized analytics | Cloud + Edge hybrid | Edge for real-time, cloud for aggregation |
| High-security / privacy-sensitive | Jetson or Pi + Hailo | Video never leaves the site |
| IoT / sensor integration + CV | Pi 5 + Hailo | Pi’s GPIO and ecosystem is unmatched for hardware integration |
Practical Lessons
1. Budget for thermal management. Jetson boards throttle at high temperatures. In hot climates (40-50°C ambient), you need passive heatsinks, thermal pads, and possibly vented enclosures. This adds $25-60 per unit but prevents field failures.
2. SD cards die. If your Jetson runs from an SD card, expect failures within 6-12 months from constant write cycles. Move to eMMC or NVMe storage for production.
3. OTA updates are non-negotiable. You will need to update models and software after deployment. Build the update infrastructure before you ship the first unit, not after.
4. Power handling matters. In transit and some retail environments, power is unpredictable. Your system needs to handle sudden power loss without corrupting the filesystem or losing data.
5. Cloud fallback is smart. Even edge-first architectures benefit from cloud connectivity for model updates, log collection, and dashboard hosting. Design for offline-first, but use the cloud when it’s available.
Conclusion
There’s no universally “best” hardware for CV deployment. The right choice depends on your connectivity, latency requirements, scale, budget, and environmental constraints.
If you’re unsure, start with a cloud prototype to validate the model, then move to edge hardware for production. That’s the path we’ve followed successfully across multiple projects.
Need help choosing the right architecture for your CV project? We can help.