Artificial Intelligence Engineer

Kronkite • Full-time • Charlotte, NC, US • $120k - $195k / year • 3w ago

Senior AI Engineer — Real-Time Video Systems

Location: Uptown Charlotte, North Carolina

We are building a high-performance, GPU-accelerated video intelligence platform operating in latency-sensitive, production environments. This role is for a senior engineer who can independently architect, optimize, and scale real-time computer vision systems without supervision.

You will own inference performance, model optimization, and production reliability end-to-end. This is not a research-only role. It is not a “train a model and hand it off” role. You will be responsible for making models fast, stable, and production-ready in live environments.

If you thrive on squeezing maximum throughput from GPUs, designing resilient inference services, and making real systems perform under load, this will be a strong fit.

What You’ll Own

Architect and optimize GPU-accelerated inference pipelines for high-volume video streams.
Drive performance tuning initiatives: batching strategy, frame stride, memory allocation, quantization, and hardware-level optimization.
Implement and refine object detection systems (YOLO-class architectures or equivalent) with temporal filtering and multi-frame logic.
Reduce false positives through tracking, smoothing, and sequence-aware event logic.
Own latency, throughput, and VRAM efficiency metrics — and improve them.
Integrate inference outputs into distributed, event-driven systems and cloud storage layers.
Design production observability: metrics, logging, alerting, and fault-tolerant execution paths.
Collaborate on dataset refinement and model iteration while maintaining a production-first mindset.
Contribute to containerized deployment and scalable runtime infrastructure.

What We’re Looking For

5+ years building and shipping production ML/computer vision systems.
Demonstrated ownership of performance-critical GPU inference pipelines.
Deep proficiency in Python, PyTorch, and OpenCV.
Strong hands-on experience with:
YOLO-class detection frameworks
ONNX and TensorRT optimization
CUDA-level performance tuning
Model quantization and throughput optimization
Solid understanding of video processing fundamentals:
Frame sampling strategies
Temporal filtering and tracking
Confidence calibration
Multi-stream aggregation
Experience deploying containerized workloads (Docker) in production.
Ability to independently diagnose bottlenecks and implement performance improvements without direction.

Ideal Profile

You have shipped production systems that operate continuously under load.
You are comfortable profiling GPU memory and compute usage.
You understand the trade-offs between accuracy, latency, and cost.
You prefer building resilient systems over writing academic experiments.
You require minimal oversight and are comfortable defining technical direction within your domain.

Core Technology Environment

Python, PyTorch, OpenCV, YOLO-class models, ONNX, TensorRT, CUDA, async I/O frameworks, REST/gRPC APIs, event-driven systems, cloud storage/messaging platforms, Docker, production telemetry tools.