About Black Forest Labs
Black Forest Labs builds generative models for image and video used by millions of creators, developers, and businesses worldwide. Our FLUX models operate at the frontier of visual AI and are trained at scales where data movement becomes a first-order constraint.
We’re headquartered in Freiburg, Germany, with a growing presence in San Francisco, and we focus on research rigor, open science, and building systems that enable real breakthroughs.
We're looking for infrastructure engineers who want to work at peta-to-exabyte scale. You'll build the data systems behind the largest training runs on thousands of GPUs, where fixing one bottleneck lets researchers train the next breakthrough model.
What You’ll Work On
- Scalable data loaders for training runs across thousands of GPUs
- Efficient storage and retrieval systems for petabyte-scale datasets
- Multi-cloud object storage abstraction
- Execute large-scale data migrations across storage systems and providers
- Debug and resolve performance bottlenecks in distributed data loading
Technical Focus
- Python, PyTorch DataLoader internals
- Object storage (e.g. S3, Azure Blob, GCS)
- Parquet for metadata
- Video: ffmpeg, PyAV, codec fundamentals
What We’re Looking For
- Built and operated data pipelines at petabyte scale
- Optimized data loading
- Worked with petabyte-scale video and image datasets
- Written processing jobs operating on millions of files
- Debugged distributed system bottlenecks across large fleets of machines
Nice to have:
- Experience streaming dataset formats (e.g. WebDataset)
- Video codec internals and frame-accurate seeking
- Distributed systems experience
- Slurm and Kubernetes for job orchestration
- Experience with object storage performance tuning across providers
If this sounds like work you’d enjoy, we’d love to hear from you.
Annual Salary (SF) : $180,000–$300,000 USD + Equity depending on profile and experience