Senior AI Engineer — Real-Time Video Systems
Location: Uptown Charlotte, North Carolina
We are building a high-performance, GPU-accelerated video intelligence platform operating in latency-sensitive, production environments. This role is for a senior engineer who can independently architect, optimize, and scale real-time computer vision systems without supervision.
You will own inference performance, model optimization, and production reliability end-to-end. This is not a research-only role. It is not a “train a model and hand it off” role. You will be responsible for making models fast, stable, and production-ready in live environments.
If you thrive on squeezing maximum throughput from GPUs, designing resilient inference services, and making real systems perform under load, this will be a strong fit.
What You’ll Own
- Architect and optimize GPU-accelerated inference pipelines for high-volume video streams.
- Drive performance tuning initiatives: batching strategy, frame stride, memory allocation, quantization, and hardware-level optimization.
- Implement and refine object detection systems (YOLO-class architectures or equivalent) with temporal filtering and multi-frame logic.
- Reduce false positives through tracking, smoothing, and sequence-aware event logic.
- Own latency, throughput, and VRAM efficiency metrics — and improve them.
- Integrate inference outputs into distributed, event-driven systems and cloud storage layers.
- Design production observability: metrics, logging, alerting, and fault-tolerant execution paths.
- Collaborate on dataset refinement and model iteration while maintaining a production-first mindset.
- Contribute to containerized deployment and scalable runtime infrastructure.
What We’re Looking For
- 5+ years building and shipping production ML/computer vision systems.
- Demonstrated ownership of performance-critical GPU inference pipelines.
- Deep proficiency in Python, PyTorch, and OpenCV.
- Strong hands-on experience with:
- YOLO-class detection frameworks
- ONNX and TensorRT optimization
- CUDA-level performance tuning
- Model quantization and throughput optimization
- Solid understanding of video processing fundamentals:
- Frame sampling strategies
- Temporal filtering and tracking
- Confidence calibration
- Multi-stream aggregation
- Experience deploying containerized workloads (Docker) in production.
- Ability to independently diagnose bottlenecks and implement performance improvements without direction.
Ideal Profile
- You have shipped production systems that operate continuously under load.
- You are comfortable profiling GPU memory and compute usage.
- You understand the trade-offs between accuracy, latency, and cost.
- You prefer building resilient systems over writing academic experiments.
- You require minimal oversight and are comfortable defining technical direction within your domain.
Core Technology Environment
Python, PyTorch, OpenCV, YOLO-class models, ONNX, TensorRT, CUDA, async I/O frameworks, REST/gRPC APIs, event-driven systems, cloud storage/messaging platforms, Docker, production telemetry tools.