DGX Cloud Team is looking for a Senior Technical Program Manager (TPM) to guide complex, cross-functional projects that support NVIDIA’s next-generation AI infrastructure. This position involves leading software-related initiatives across cloud platforms, infrastructure services, and distributed systems. The role focuses heavily on cloud-native software delivery, Kubernetes-based platforms, and large-scale AI workloads.
You will be responsible for managing high-impact engineering programs within a dynamic, fast-paced roadmap, aligning priorities across teams, and ensuring timely and high-quality delivery. This role requires strong technical skill, a proactive approach, and the ability to operate effectively across multiple levels of the organization. We are specifically looking for a software TPM with strong Kubernetes experience who can help drive execution across platform software and cloud infrastructure.
What You'll Be Doing:
- Lead the complete implementation of DGX Cloud software initiatives, encompassing planning, management, delivery, and operationalization across NVIDIA’s cloud infrastructure.
- Partner with software, infrastructure, product, and platform engineering teams to align on goals, architecture achievements, deliverables, and schedules.
- Lead initiatives involving Kubernetes-based platforms, cloud-native services, platform APIs, and distributed systems that enable AI training and inference workloads.
- Define and implement scalable program management processes, tools, and guidelines to ensure high execution velocity and program transparency.
- Identify cross-functional dependencies, mitigate risks, and drive resolution of complex technical and programmatic issues across the software stack.
- Establish clear success metrics and reporting mechanisms to track progress and communicate status to senior leadership.
- Foster a culture of collaboration and continuous improvement across engineering, product, and operations teams.
- Develop and implement metrics for assessing program efficiency and identifying areas for improvement, collect and analyze data to support planning and data-driven decisions.
- Report on overall program status, providing insights and recommendations to senior management.
- Drive organizational alignment and efficiency by coordinating with multi-functional leads and streamlining processes across software development lifecycles and release execution.
What We Need To See:
- Postgraduate degree in Computer Science, Artificial Intelligence, or equivalent experience.
- 12+ years of program management experience, including proven ability managing global projects across multiple time zones.
- Solid knowledge of cloud-native software systems, Kubernetes, containerized applications, microservices architectures, and infrastructure-as-a-service (IaaS) platforms.
- Practical experience working with Kubernetes is required.
- Proven experience driving large-scale software programs in fast-paced engineering environments.
- Strong understanding of software engineering guidelines, release procedures, system integration, and platform delivery.
- Proven experience creatively resolving technical issues and resource conflicts.
- You should be detail oriented with proven ability to multitask in a dynamic environment with shifting priorities and changing requirements.
- It is essential that you possess direct experience working within a dynamic software development environment.
- Excellent communication and technical presentation skills.
- Significant experience with large-scale Agile tools, reporting, and processes relevant to this role is required.
- Demonstrated skill in engaging and moderating successful engagements with engineering, operations, and product teams.
Ways To Stand Out From The Crowd:
- Strong background in Machine Learning, Deep Learning, and Artificial Intelligence applications.
- Prior experience leading programs for Kubernetes platforms, cloud-native infrastructure, platform services, or developer platforms.
- Experience with software release management, service operationalization, and large-scale platform adoption.
- Familiarity with observability, CI/CD, infrastructure automation, and service reliability practices in cloud environments.
- Consistent track record of driving process improvements and measuring efficiency.
- Familiarity with NVIDIA platforms, products, and ecosystem is a plus.
With competitive salaries and a generous benefits package, NVIDIA is widely considered to be one of the technology industry's most desirable employers. We have some of the most forward-thinking and hardworking people in the world working with us and our engineering teams are growing fast in some of the most impactful fields of our generation: Deep Learning, Artificial Intelligence, and Autonomous Vehicles. If you're a hardworking individual who enjoys autonomy and shares our passion for technology, we want to hear from you. We are looking for great people like you to help us accelerate the next wave of artificial intelligence.
#LI-Hybrid
Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 200,000 USD - 322,000 USD.
You will also be eligible for equity and benefits.
Applications for this job will be accepted at least until May 26, 2026.
This posting is for an existing vacancy.
NVIDIA uses AI tools in its recruiting processes.
NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.