A well-funded, independent AI research lab is building the next generation of multimodal foundation models—systems that understand and express ideas across text, audio, video, and 3D interactive environments in real time.
The team’s north star is humanistic general intelligence: AI that doesn’t just reason, but perceives, responds, and communicates with emotional depth, expressive nuance, and creative intent. This is a rare opportunity to work on a frontier problem space with deep technical freedom and long-term research horizons.
What You’ll Do
You will lead research and development on state-of-the-art generative video models, shaping the core architectures and algorithms that power the lab’s next major breakthroughs. Responsibilities include:
- Driving research on generative foundation models for video, from ideation through prototype and production
- Designing and evaluating large-scale generative algorithms and training schemes for high-fidelity video synthesis
- Developing efficient image/video representations, latent spaces, and training objectives
- Identifying critical research directions and contributing to the lab’s long-term video-generation roadmap
- Designing and curating large, high-quality multimodal datasets, collaborating across research, product, and data teams
What You Bring
- PhD in CS, EE, Mathematics, or related field or equivalent applied research experience
- 3+ years working in one or more of:
- Text-to-image / text-to-video generation
- Video diffusion or autoregressive video modeling
- Image/video representation learning at scale
- Large-language-model pre-training / fine-tuning
- Deep expertise in modern generative modeling: Diffusion, Flow Matching, Autoregressive Transformers, Mamba-style architectures, VAE, GANs
- Strong programming foundations (Python) and practical ML engineering skills
- Ability to operate independently, explore novel ideas aggressively, and communicate research clearly
Preferred Experience
- Publications at CVPR, ICCV, ECCV, NeurIPS, ICML, ICLR, SIGGRAPH, or similar
- Hands-on experience with large-scale distributed training (FSDP, ZeRO, data/model parallelism)
- Deep understanding of diffusion variants (DDIM, flow matching, rectified flows, etc.)
- Strong software engineering fundamentals and an interest in building real systems, not just prototypes
- Thrives in a fast, iterative, research-driven startup environment
Total Comp: 500,000-1,000,000