About Fundamental
Fundamental is an AI company pioneering the future of enterprise decision-making. Founded by DeepMind alumni, Fundamental has developed NEXUS – the world's most powerful Large Tabular Model (LTM) – purpose-built for the structured records that actually drive enterprise decisions. Backed by world class investors and trusted by Fortune 100 companies, Fundamental unlocks trillions of dollars of value by giving businesses the Power to Predict.
At Fundamental, you'll work on unprecedented technical challenges in foundation model development and build technology that transforms how the world's largest companies make decisions. This is your opportunity to be part of a category-defining company from the ground-up. Join the team defining the future of enterprise AI.
Key responsibilities
Lead and mentor a team of MLOps engineers, fostering technical growth and a culture of operational excellence
Define and drive the MLOps roadmap, aligning infrastructure capabilities with Research, Engineering and product objectives
Establish best practices, standards, and processes for ML infrastructure, deployment, and operations
Own technical decision-making for ML infrastructure architecture and tooling choices
Architect and oversee scalable, automated machine learning pipelines, CI/CD workflows, and orchestration frameworks
Drive the design and implementation of robust model serving infrastructure using platforms like Triton, TorchServe, TensorFlow Serving, and KServe
Define inference architecture strategy optimized for ultra-low latency and high throughput
Design and maintain feature stores, robust data pipelines, and scalable storage solutions to efficiently handle large volumes of data
Collaborate with research teams to bridge the gap between experimentation and production
Define logging, alerting, and monitoring strategy to track model performance, drift, and system reliability
Must have
Bachelor's or Master's degree in Computer Science, Engineering, or a related field (or equivalent practical experience)
7+ years of experience in MLOps, with 3+ years in a technical leadership role
Strong software engineering skills in Python, with experience in Bash and/or Go
Proven track record of building and leading high-performing MLOps or infrastructure teams
Experience building and designing MLOps infrastructure from the ground up
Deep experience with MLOps platforms (MLflow, WandB, etc.) and frameworks (PyTorch, TensorFlow, etc.)
Deep experience with model serving frameworks (Triton, TorchServe, TensorFlow Serving, KServe) for high scalability and low latency inference
Experience building and managing data pipelines to support both model training and inference
Good experience with Kubernetes on a major cloud provider (AWS, GCP, or Azure) and with infrastructure as code (Terraform, Helm, GitOps)
Proficient with observability and monitoring tools (Prometheus, Grafana, Datadog, OpenTelemetry)
Excellent communication skills with ability to translate between research and production contexts
Nice to have
Experience with workflow orchestration tools (Kubeflow, Airflow, Argo Workflows)
Experience with FastAPI and backend applications
Familiarity with data platforms like Databricks or Snowflake
Experience with LLM/foundation model serving and optimization
Exposure to SRE practices or cloud security certifications
Experience scaling ML infrastructure for AI startups
Benefits
Competitive compensation with salary and equity
Comprehensive health coverage, including medical, dental, vision, and 401K
Fertility support, as well as paid parental leave for all new parents, inclusive of adoptive and surrogate journeys
Relocation support for employees moving to join the team in one of our office locations
A mission-driven, low-ego culture that values diversity of thought, ownership, and bias toward action