At NVIDIA, we are seeking a highly skilled Senior Operations Engineer to join our world-class NGC Cloud team. In this role, you will help drive the efficiency, reliability, and scalability of the systems that power our global business operations. This is an exceptional opportunity to shape how we automate, streamline, and support critical operational workflows across the organization. You will define how we implement innovative automation and support solutions, enabling teams to operate seamlessly and deliver impact at global scale—all within an encouraging and inclusive environment.
What you'll be doing:
Driving day-to-day interactions with NVIDIA wide IT subsystems, ensuring smooth operational workflows across infrastructure and applications.
Crafting and maintaining GitLab CI/CD pipelines to automate build, test, and deployment workflows.
Monitoring system health, building/maintaining dashboards, creating alerts, and producing operational reports.
Performing user offboarding, access reviews, and compliance-related tasks across multiple systems.
Drive interactions with various IT subsystems, ensuring API performance and integration stability meet defined SLAs and SLOs.
Coordinating changes and releases between engineering, operations, and security teams.
Enforcing security guidelines, managing vulnerability remediation, and collaborating with security teams on audits and assessments.
Maintaining documentation, SOPs, and process improvements to enhance operational maturity.
What we need to see:
8+ years of hands-on experience building/supporting complex services and BS/MS in Computer Science (or equivalent experience).
Knowledge in Python for automation, data handling, and tool development.
Experience with monitoring tools (such as Prometheus, Grafana, Datadog, CloudWatch, Splunk) and reporting.
Familiarity with ITSM practices, including incident, problem, and modification processes.
Ability to perform secure and compliant offboarding and access-related tasks.
Strong understanding of IT operations and system workflows.
Knowledge in core Java - Collections API, Streams API, Concurrency, I/O.
Knowledge in RDBMS and NoSQL (Cassandra, DynamoDb, Redis) databases.
Excellent communication skills with the ability to collaborate across multiple teams.
Excellent documentation, problem-solving, and communication skills for cross-team alignment.
Ways to stand out from the crowd:
Experience designing or implementing automation pipelines or internal operational tools.
Background in customer support, technical support, or customer-facing engineering roles.
Prior work in a security-conscious or compliance-heavy environment.
Ability to build end-to-end monitoring solutions, dashboards, and automated reporting.
Strong documentation habits and a continuous-improvement approach.
Widely considered to be one of the technology world’s most desirable employers, NVIDIA offers highly competitive salaries and a comprehensive benefits package. As you plan your future, see what we can offer to you and your family www.nvidiabenefits.com/
Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 184,000 USD - 287,500 USD.
You will also be eligible for equity and benefits.
Applications for this job will be accepted at least until December 6, 2025.NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.