Position: MLOps Engineer / Vertex AI Engineer
Company: International Professional Services
Location: Remote - Chicago Preferred
Employment Type: Full-Time
Overview
The MLOps Engineer / Vertex AI Engineer will lead the development, deployment, automation, and operational management of enterprise machine learning platforms built on Google Cloud Platform. This role is responsible for creating scalable, secure, and reliable machine learning infrastructure that enables data scientists and AI teams to efficiently move models from experimentation into production. Working closely with data scientists, data engineers, cloud architects, software engineers, and business stakeholders, the successful candidate will establish repeatable MLOps processes that accelerate AI adoption while maintaining governance, reliability, and operational excellence. The role requires expertise in Vertex AI, machine learning lifecycle management, CI/CD automation, containerization, Kubernetes, and cloud-native platform engineering. As AI becomes increasingly important to business strategy, this position will play a critical role in building the infrastructure and operational capabilities necessary to support enterprise-scale AI initiatives.
Purpose
This role provides the opportunity to build and scale the foundational AI infrastructure that enables machine learning, generative AI, and advanced analytics initiatives across the enterprise. The work directly impacts the organization's ability to operationalize AI and create measurable business value from data-driven innovation.
Growth
The position offers hands-on experience with modern AI platforms, enterprise-scale machine learning deployments, emerging generative AI technologies, and cloud-native engineering practices. The successful candidate will help shape the future AI strategy while expanding expertise across MLOps, platform engineering, automation, governance, and enterprise architecture.
Motivators
This opportunity is ideal for engineers who enjoy building scalable systems, automating complex processes, solving infrastructure challenges, and enabling data science teams to operate at scale. The role combines cutting-edge technology, technical ownership, and the opportunity to influence how AI is deployed and governed across the organization.
Objectives
1. Build and Operationalize an Enterprise MLOps Platform on Google Cloud
Within the first six months, assess the current machine learning environment and establish a scalable MLOps platform leveraging Vertex AI and Google Cloud services. Design and implement standardized deployment frameworks, model lifecycle management processes, and infrastructure patterns that support enterprise AI initiatives. Collaborate with data science, engineering, and business teams to ensure the platform supports both current and future machine learning workloads. Success will be measured through platform adoption, deployment efficiency, operational reliability, and stakeholder satisfaction. This objective may be enhanced through AI-assisted platform monitoring and automation capabilities.
2. Develop Automated ML Deployment and CI/CD Frameworks
Within the first nine months, design and implement automated machine learning deployment pipelines that accelerate model promotion from development through production while maintaining quality, governance, and operational controls. Build reusable CI/CD frameworks that support model versioning, testing, validation, monitoring, and rollback capabilities. Partner with engineering teams to standardize machine learning delivery processes across the organization. Success will be measured through deployment frequency, reduced deployment effort, increased reliability, and shorter model release cycles.
3. Establish Enterprise AI Governance and Operational Excellence
Within the first twelve months, develop governance frameworks that support responsible AI deployment, model monitoring, compliance requirements, auditability, and operational oversight. Implement processes that ensure models remain accurate, secure, explainable, and aligned with organizational policies throughout their lifecycle. Collaborate with security, compliance, and business leaders to establish governance standards that balance innovation with risk management. Success will be measured through governance adoption, compliance readiness, model performance stability, and operational transparency.
4. Enable Scalable AI and Machine Learning Adoption Across the Enterprise
Within the first year, build scalable infrastructure, reusable services, feature management capabilities, and operational support frameworks that enable multiple teams to efficiently develop and deploy AI solutions. Evaluate emerging technologies and continuously improve platform capabilities to support advanced machine learning and generative AI initiatives. Success will be measured through platform utilization, business adoption, operational efficiency, and the successful delivery of AI-enabled business outcomes.
Critical Subtasks
1. Deploy and Manage Production Machine Learning Models
Within the first 90 days, establish operational processes and deployment standards for managing machine learning models in production environments. Ensure deployments are secure, reliable, scalable, and aligned with business service level objectives. Success will be measured through deployment stability, uptime, and successful production adoption.
2. Build Vertex AI-Based Machine Learning Pipelines
Within the first six months, develop automated workflows using Vertex AI that support model training, validation, deployment, monitoring, and lifecycle management. Create reusable templates and operational standards that improve consistency and reduce manual effort. Success will be measured through automation coverage, deployment speed, and reduced operational complexity.
3. Implement Feature Store and Data Pipeline Integration Standards
Establish scalable feature management processes that enable consistent, governed, and reusable machine learning features across multiple use cases. Collaborate with data engineering teams to ensure reliable integration between data platforms and machine learning environments. Success will be measured through feature reuse, model consistency, and reduced development effort.
4. Develop Containerization and Kubernetes Deployment Frameworks
Within the first six months, establish Docker and Kubernetes deployment standards that support scalable machine learning workloads across development, testing, and production environments. Ensure infrastructure supports high availability, performance, and operational efficiency. Success will be measured through deployment reliability, scalability, and infrastructure utilization.
5. Implement Model Monitoring and Performance Management Processes
Develop monitoring frameworks that track model accuracy, drift, performance, usage patterns, and operational health. Establish alerting mechanisms and remediation procedures that ensure issues are identified and addressed proactively. Success will be measured through model stability, issue resolution speed, and operational visibility.
6. Partner with Data Science and Engineering Teams
Work closely with data scientists, software engineers, cloud architects, and business stakeholders to align platform capabilities with organizational objectives. Provide technical leadership, deployment guidance, and operational support that improves the effectiveness of machine learning initiatives. Success will be measured through stakeholder satisfaction, project success, and platform adoption.
7. Continuously Evaluate and Integrate AI to Improve Performance
Within the first 90–180 days, identify opportunities to leverage AI and automation technologies to improve machine learning operations, infrastructure management, model governance, deployment workflows, monitoring, and engineering productivity. Evaluate emerging capabilities, lead pilot initiatives, and embed successful solutions into daily operations. Success will be measured through productivity improvements, reduced operational effort, faster deployment cycles, improved governance outcomes, and measurable business value creation.
Success Profile
A top performer in this role will:
Build a scalable enterprise MLOps platform on Google Cloud.
Accelerate machine learning deployment and operationalization.
Establish repeatable CI/CD processes for machine learning workloads.
Improve AI governance, compliance, and operational reliability.
Enable data science teams to deploy and manage models efficiently.
Support advanced analytics, machine learning, and generative AI initiatives.
Leverage automation and AI to continuously improve platform capabilities.
Serve as a trusted technical expert in MLOps and Vertex AI architecture.