We are looking for an Engineer to build the training infrastructure, data pipelines, and inference optimization systems for state-of-the-art Diffusion Transformer (DiT) models. This role focuses on scaling the fine-tuning and deployment of models like Qwen, Wan, and LTX-2.
Key Responsibilities
- Training Infrastructure: Design and maintain scalable pipelines for training and fine-tuning Diffusion Transformer models on large-scale GPU clusters.
- Model Optimization: Optimize the inference performance of Wan, LTX-2, and Qwen (Vision) using quantization, pruning, and hardware-aware tuning (e.g., TensorRT, FlashAttention).
- Data Engineering: Develop efficient ingestion and preprocessing pipelines for high-resolution image and video datasets used in generative tasks.
- Capability Expansion: Implement engineering workflows that allow researchers to rapidly fine-tune and expand the capabilities of open-weights diffusion models.
- Production Deployment: Transition experimental fine-tuned models into reliable, low-latency production services.
- Resource Management: optimize distributed training jobs (FSDP, DeepSpeed) to maximize GPU utilization and minimize costs.
Required Qualifications
- Min 2 years of experience in Machine Learning Engineering with a focus on generative models.
- Core Tech: Strong proficiency in PyTorch, JAX, and distributed training frameworks.
- Model Expertise: Hands-on experience deploying or fine-tuning Diffusion Transformers (DiT) and specifically Qwen (Image), Wan, or LTX-2.
- Architecture: Deep understanding of Transformer-based diffusion backbones and flow matching (removing legacy reliance on CNNs/RNNs).
- Tooling: Proficiency in Python and modern ML ecosystem tools (e.g., Hugging Face, Diffusers, FFmpeg for video processing).
- Compute: Experience debugging and optimizing workloads in multi-node GPU environments.
Preferred Qualifications
- Inference Optimization: Experience with techniques like KV-caching, compile-time optimizations, or kernel fusion for transformers.
- MLOps: Familiarity with experiment tracking (W&B) and model versioning tools in a generative media context.
- Streaming: Experience handling real-time video generation or streaming inference pipelines.
- Open Source: Contributions to libraries like
diffusers or active experimentation with the latest open-source DiT implementations.