Computer Vision Engineer — Pose Estimation & Scene Understanding
Location: San Francisco Bay Area or Los Angeles, California
Work Style: Hybrid, with regular in-person collaboration
About Us
We are a funded, early-stage technology company building a real-time simulation platform at the intersection of multi-agent AI, machine learning, and large-scale real-world data. Our approach combines deep learning, real-time systems, and advanced modeling to simulate complex, dynamic environments from the ground up.
We're starting with foundational simulation and data infrastructure before scaling fidelity and intelligence layers. The team is small, backed by experienced investors, and led by repeat founders with multiple exits. If you want to build something that hasn't been built before, we'd like to talk.
The Role
We're building AI-powered game simulations driven by large-scale real-world data. Our pipeline detects players, estimates 3D pose from monocular video, and maps that data onto a 3D game positioning and model. We're looking for a CV engineer to own and improve this entire upstream extraction pipeline.
What you'll do
You'll own the pipeline that turns raw basketball broadcast footage into clean, accurate 3D pose and trajectory data. That means fine-tuning and improving our HMR 2.0-based pose estimation, optimizing player detection and tracking, building or training specialized models for court keypoint detection, and refining our homography estimation to accurately place 3D pose data onto the court. You'll be responsible for the accuracy and reliability of every piece of training data that feeds our downstream diffusion models. When extraction quality degrades—occlusions, camera cuts, unusual angles—you'll diagnose the problem and fix it, whether that means fine-tuning an existing model, training a new one, or engineering around the issue.
What we're looking for
- Strong experience with human pose estimation—ideally you've fine-tuned models like HMR, CLIFF, or similar mesh recovery architectures. Experience with object detection and tracking in video—YOLO, ByteTrack, or similar.
- Solid understanding of camera geometry: homographies, camera calibration, projection between 2D and 3D coordinate systems.
- Ability to identify when an off-the-shelf model isn't cutting it and train a specialized model to fill the gap—for example, a court keypoint detector trained on sports broadcast data.
- Comfort working with messy real-world video data—broadcast footage is full of overlays, replays, camera cuts, and occlusions.
- Strong PyTorch skills and experience training and evaluating vision models.
Nice to have
- Experience with sports video analysis specifically.
- Familiarity with multi-view geometry or structure from motion.
- Experience with temporal pose tracking and smoothing across video sequences.
- Background in dataset curation—knowing what clean training data looks like and how to get there.
Why this role matters
Everything downstream depends on the quality of extracted motion data. Our diffusion models learn from it. Our rendered simulations are driven by it. If the pose data is noisy, occluded, or misaligned, nothing else works. You own the foundation of the entire pipeline.
Additional Information
We're a small, brilliant team working in stealth on an ambitious project. We value in-person collaboration and are building something meaningful and one of a kind.
Compensation will depend on experience and contract structure. Candidates must have authorization to work in their respective location. This is a part-time contract to hire a position.
We are an equal opportunity employer and are committed to fostering a diverse, inclusive workplace.