Generative AI Researcher – Image & Video Diffusion

Lexlegis AI • Full-time • Mumbai City, Maharashtra, India • 2h ago

We are looking for an Engineer to build the training infrastructure, data pipelines, and inference optimization systems for state-of-the-art Diffusion Transformer (DiT) models. This role focuses on scaling the fine-tuning and deployment of models like Qwen, Wan, and LTX-2.

Key Responsibilities

Training Infrastructure: Design and maintain scalable pipelines for training and fine-tuning Diffusion Transformer models on large-scale GPU clusters.
Model Optimization: Optimize the inference performance of Wan, LTX-2, and Qwen (Vision) using quantization, pruning, and hardware-aware tuning (e.g., TensorRT, FlashAttention).
Data Engineering: Develop efficient ingestion and preprocessing pipelines for high-resolution image and video datasets used in generative tasks.
Capability Expansion: Implement engineering workflows that allow researchers to rapidly fine-tune and expand the capabilities of open-weights diffusion models.
Production Deployment: Transition experimental fine-tuned models into reliable, low-latency production services.
Resource Management: optimize distributed training jobs (FSDP, DeepSpeed) to maximize GPU utilization and minimize costs.

Required Qualifications

Min 2 years of experience in Machine Learning Engineering with a focus on generative models.
Core Tech: Strong proficiency in PyTorch, JAX, and distributed training frameworks.
Model Expertise: Hands-on experience deploying or fine-tuning Diffusion Transformers (DiT) and specifically Qwen (Image), Wan, or LTX-2.
Architecture: Deep understanding of Transformer-based diffusion backbones and flow matching (removing legacy reliance on CNNs/RNNs).
Tooling: Proficiency in Python and modern ML ecosystem tools (e.g., Hugging Face, Diffusers, FFmpeg for video processing).
Compute: Experience debugging and optimizing workloads in multi-node GPU environments.

Preferred Qualifications

Inference Optimization: Experience with techniques like KV-caching, compile-time optimizations, or kernel fusion for transformers.
MLOps: Familiarity with experiment tracking (W&B) and model versioning tools in a generative media context.
Streaming: Experience handling real-time video generation or streaming inference pipelines.
Open Source: Contributions to libraries like diffusers or active experimentation with the latest open-source DiT implementations.