About the Role
High-Performance Networking Engineer on xAI’s Supercomputing team, you will design and optimize low-latency, high-bandwidth networking solutions using NVIDIA’s RDMA-capable technologies to support some of the world’s largest GPU supercomputing clusters. These clusters drive AI training and inference workloads, demanding cutting-edge performance and scalability.
Focus
- Develop and tune RDMA-based communication systems leveraging NVIDIA GPUs and Mellanox NICs (InfiniBand, RoCE) for ultra-fast data transfer between nodes.
- Implement and optimize GPUDirect RDMA to enable direct memory access between GPUs and network interfaces, minimizing CPU overhead.
- Integrate RDMA solutions with Kubernetes-based workloads, ensuring seamless operation across distributed compute and storage systems.
- Collaborate with AI researchers and infrastructure teams to accelerate data pipelines and collective communications using NCCL and MPI.
- Troubleshoot and resolve performance bottlenecks in high-throughput, low-latency networking environments.
Ideal Experience
- Hands-on experience with NVIDIA RDMA technologies (e.g., GPUDirect RDMA, RoCE, InfiniBand) in HPC or AI supercomputing environments.
- Proficiency in programming with Rust, C, or C++ for low-level networking and system optimization.
- Familiarity with NVIDIA’s networking stack, including Mellanox drivers, libraries (e.g., libibverbs), and tools (e.g., NVPeerMemory).
- Experience optimizing distributed systems with MPI, NCCL, or similar frameworks for GPU-accelerated workloads.
- Knowledge of Kubernetes networking and integrating RDMA into containerized environments.
- Bonus: Background in AI/ML training workflows and their networking demands (e.g., large-scale parameter synchronization).
Tech Stack
- NVIDIA GPUs and Mellanox networking (InfiniBand, RoCE)
- RDMA protocols (e.g., GPUDirect RDMA, RoCEv2)
- Kubernetes
- Rust and C/C++
- MPI (Message Passing Interface) and NCCL (NVIDIA Collective Communications Library)
Annual Salary Range
$180,000 - $440,000 USD
Benefits
Base salary is just one part of our total rewards package at xAI, which also includes equity, comprehensive medical, vision, and dental coverage, access to a 401(k) retirement plan, short & long-term disability insurance, life insurance, and various other discounts and perks.