The Opportunity
As a Member of Technical Staff on AI Infrastructure, you will build and maintain the foundational systems and distributed infrastructure that power AI model post training, inference, and data pipelines. You will collaborate with engineering and research teams to ensure performance, scalability, and reliability of critical AI systems.
What You’ll Do
-
Design and implement large-scale, distributed AI infrastructure and services
-
Optimize performance for GPU/xPU accelerators and cloud environments
-
Build tools for observability, reliability, and scaling of AI workloads
-
Partner with cross-functional teams to define AI infrastructure requirements and roadmap
-
Contribute to architectural design and system longevity
About You
-
Have experience with GenAI infrastructure systems, distributed systems, cloud computing, and high-performance infrastructure
-
Are proficient in programming languages like Python, Go, or similar
-
Understand scaling challenges specific to AI workloads and accelerators
-
Thrive in fast-paced, collaborative engineering environments