NVIDIA is looking for a Cloud SRE Architect to work in IPP's (Infrastructure, Planning and Process) Cloud Infrastructure Team. IPP is a global organization within NVIDIA. This group works with various other groups within NVIDIA such as Graphics Processors, Mobile Processors, Deep Learning, Artificial Intelligence and Driverless Cars to cater to their infrastructure needs. These cloud services provide almost half a million automated jobs per day on thousands of servers helping with the efficiency of thousands of NVIDIA's software engineers worldwide. The cloud hosts various machines and devices with operating systems like Windows, Linux, and Android. It supports hardware platforms including NVIDIA GPUs and Tegra Processors. It delivers unified CI/CD solutions and cloud-based software development. Are you passionate about distributed infrastructure and looking for sophisticated, critical issues, ready to build the next generation of cloud services, design creative solutions, mine through data to uncover real problems and fix them?
What you'll be doing:
Serve as an SRE Architect part of GPU Private Cloud team used by thousands of NVIDIANs globally for interactive development, centralized CI / CD and QA testing
Evaluating, identifying and developing software solutions to optimize critical software development workflows across various organizations within Nvidia.
Architecting, Implementing & supporting end-to-end CI/CD system using open-source and Nvidia proprietary software.
Customer (NVIDIA Internal development teams) onboarding to Private cloud infrastructure with a good discovery of the use case and available solutions within the cloud
Identify performance bottlenecks and optimize the speed and cost efficiency of AI development and testing systems.
Leading software development projects and technically direct a team of brilliant engineers and guide them to provide efficient and impactful solutions.
Looking for problems within software systems and resolving the issues
Craft and implement critical metrics using various analytics methods and dashboards
What we need to see:
BS EE/CS or equivalent experience with 18+ years of systems software development including at least 1 year dedicated to developing/exploring AI.
Experience of maintaining cloud infrastructure and highly available production environment.
Strong programming and software development skills in JAVA, Python, Shell-script along with good understanding of distributed systems and REST APIs.
Experience in working with SQL/NoSQL database systems such as MySQL, Cassandra, MongoDB or Elasticsearch.
Excellent knowledge and working experience with Docker containers and Virtual Machines.
Good background of Cloud technologies like: OpenStack, Docker, Kubernetes, Chef/Puppet, Hadoop/Ceph/SwiftStack, LXC, Git, Perforce, JFrog, Kafka.
Ability to work across organizational boundaries effectively to improve alignment and productivity between teams in a multi-national, multi-time-zone corporate environment.
Ways to stand out from the crowd:
Depth in AI, Machine Learning and Deep Learning algorithms and techniques.
Strong collaborative and interpersonal skills, with a consistent record of guiding and influencing others in dynamic environments.
Experience developing large-scale software systems using modular architecture under real-time performance requirements.
Background in designing high-performance, scalable software systems with a strong focus on hardware cost optimization.
#LI-Hybrid
Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 320,000 USD - 488,750 USD.
You will also be eligible for equity and benefits.
Applications for this job will be accepted at least until April 5, 2026.
This posting is for an existing vacancy.
NVIDIA uses AI tools in its recruiting processes.
NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.