Senior Data Engineer, AI Infrastructure

Arbiter AI • Full-time • New York City, New York, United States • 3w ago

Arbiter is reimagining how healthcare works - not by adding more point solutions, but by building the infrastructure that runs the system. We’re designing the intelligent operating spine that unifies data, automates workflows, and aligns incentives across providers, payers, and patients. Our platform embeds AI into real-world care and revenue cycle operations, transforming administrative chaos into orchestrated execution.

We’ve just closed one of the largest early-stage raises in health tech history - backed by some of the most influential players in the industry who collectively manage millions of lives and billions in spend. Paired with a powerhouse board and executive team, we have the capital, context, and relationships to move quicker than anyone - tackling the problems we know intimately and scaling what works. Our MVP is already live with hundreds of providers managing hundreds of thousands of lives, and improving daily based on real-world feedback.

At Arbiter, engineers don’t just ship features - they shape the foundation of an industry in flux. You’ll work alongside a small, elite team of product leaders and technologists from FAANG and top startup and healthcare orgs to architect systems that handle millions of patient records, build AI agents that execute critical healthcare workflows, and design platforms that major health entities depend on daily. Every line of code you write shapes how a $4T industry operates.

As a Senior Data Engineer, AI Infrastructure, you are the architect of our AI/ML systems. You will build and maintain the platform that powers our intelligent operating system, creating the robust pipelines and infrastructure for data processing, model training, and inference. Your work is the foundation that enables our AI engineering teams to build, deploy, and scale their products effectively.

Our Engineering Culture & Values

We are a high-performing group of engineers dedicated to delivering innovative, high-quality solutions to our clients and business partners. We believe in:

Engineering Excellence: Taking immense pride in our technical craft and the products we build, treating both with utmost respect and care.
Impact-Driven Development: Firmly committed to engineering high-quality, fault-tolerant, and highly scalable systems that evolve seamlessly with business needs, minimizing disruption.
Collaboration Over Ego: Valuing exceptional work and groundbreaking ideas above all else. We seek talented individuals who are accustomed to working in a fast-paced environment and are driven to ship often to achieve significant impact.
Continuous Growth: Fostering an environment of continuous learning, mentorship, and professional development, where you can deepen your expertise and grow your career.

Responsibilities

As a Data Engineer (AI Infrastructure), you will be pivotal in building the systems that power our AI-first platform:

AI/ML Pipeline Development: Design, develop, and maintain robust, scalable data pipelines specifically for our AI models. This includes data ingestion, cleaning, transformation, classification, and tagging to create high-quality, reliable training and evaluation datasets.
MLOps & Infrastructure: Build and manage the AI infrastructure to support the full machine learning lifecycle. This includes automating model training, versioning, deployment, and monitoring (CI/CD for ML).
Embedding & Vector Systems: Architect and operate scalable systems for generating, storing, and serving embeddings. Implement and manage vector databases to power retrieval-augmented generation (RAG) and semantic search for our AI agents.
AI Platform & Tooling: Champion and build core tooling, frameworks, and standards for the AI/ML platform. Develop systems that enable AI engineers to iterate quickly and self-serve for model development and deployment.
Cross-Functional Collaboration: Partner closely with AI engineers, product managers, and software engineers to understand their needs. Translate complex model requirements into stable, scalable infrastructure and data solutions.
Mentorship & Growth: Actively participate in mentoring junior engineers, contributing to our team's growth through technical guidance, code reviews, and knowledge sharing.
Hiring & Onboarding: Play an active role in interviewing and onboarding new team members, helping to build a world-class data engineering organization.

Minimum Qualifications

8+ years of deep, hands-on experience in Data Engineering, MLOps, or AI/ML Infrastructure, ideally within a high-growth tech environment.
Exceptional expertise in data structures, algorithms, and distributed systems.
Mastery in Python for large-scale data processing and ML applications.
Extensive experience designing, building, and optimizing complex, fault-tolerant data pipelines specifically for ML models (e.g., feature engineering, training data generation).
Profound understanding and hands-on experience with cloud-native data and AI platforms, especially Google Cloud Platform (GCP) (e.g., Vertex AI, BigQuery, Dataflow, GKE).
Strong experience with containerization (Docker) and orchestration (Kubernetes) for deploying and scaling applications.
Demonstrated experience with modern ML orchestration (e.g., Kubeflow, Airflow), data transformation (dbt), and MLOps principles.
Intimate knowledge of and ability to implement unit, integration, and functional testing strategies.
Experience providing technical leadership and guidance, and thinking strategically and analytically to solve problems.
Friendly communication skills and ability to work well in a diverse team setting.
Demonstrated experience working with many cross-functional partners.

Preferred Qualifications

Experience with vector databases (e.g., Pinecone, Elasticsearch) and building embedding generation pipelines.
Experience with MLOps platforms and tools (e.g., MLflow, Weights & Biases) for experiment tracking and model management.
Experience with advanced data extraction and correlation techniques, especially from unstructured medical data sources (e.g., PDF charts, clinical notes).
Familiarity with deep learning frameworks (e.g., TensorFlow, PyTorch).
Familiarity with data governance, data security, and compliance frameworks (e.g., HIPAA, GDPR) in a highly regulated industry.

This role can be remote or onsite, based in our New York City or Boca Raton offices, in a fast-paced, collaborative environment where great ideas move quickly from whiteboard to production.

Job Benefits

We offer a comprehensive and competitive benefits package designed to support your well-being and professional growth:

Highly Competitive Salary & Equity Package: Designed to rival top FAANG compensation, including meaningful equity.
Generous Paid Time Off (PTO): To ensure a healthy work-life balance.
Comprehensive Health, Vision, and Dental Insurance: Robust coverage for you and your family.
Life and Disability Insurance: Providing financial security.
Simple IRA Matching: To support your long-term financial goals.
Professional Development Budget: Support for conferences, courses, and certifications to fuel your continuous learning.
Wellness Programs: Initiatives to support your physical and mental health.

Pay Transparency

The annual base salary range for this position is $180,000-$240,000. Actual compensation offered to the successful candidate may vary from the posted hiring range based on work experience, skill level, and other factors.