We are seeking Senior Machine Learning Engineers with strong software engineering and applied ML skills who can design, debug, and maintain ML systems in realistic, tools-enabled environments. Candidates should be comfortable working close to model behavior, diagnosing failures, improving evaluation pipelines, and making pragmatic engineering tradeoffs under ambiguous requirements.
This role focuses on system behavior and reliability, not just model training. You’ll work across training, evaluation, and infrastructure to ensure ML systems behave correctly and robustly in practice.
Core Requirements
4+ years of professional experience in Machine Learning Engineering, Applied ML, Software Engineering (ML-focused), or related roles
Strong proficiency in Python, with experience writing production-quality code and working with ML libraries (e.g., PyTorch, TensorFlow, scikit-learn)
Experience training, evaluating, and iterating on ML models, with an emphasis on diagnosing failure modes rather than just optimizing metrics
Strong understanding of ML evaluation: metrics design, test coverage, error analysis, and tradeoffs between correctness, robustness, and generalization
Ability to debug complex ML system failures, including issues caused by data, evaluation artifacts, or underspecified requirements
Comfort working with incomplete specifications and multiple valid solutions, especially in open-ended or real-world tasks
Experience working with ML pipelines or systems, including training workflows, evaluation harnesses, or model-in-the-loop systems
Preferred Experience
Experience building or maintaining ML training and evaluation pipelines
Familiarity with ML infra concepts (e.g., reproducibility, experiment tracking, model versioning)
Experience working with tools-on environments (e.g., programmatic evaluation, scripting, notebooks, or terminal-driven workflows)
Exposure to LLM systems, including model evaluation, benchmarking, prompt or agent behavior analysis
Experience reasoning about multiple valid implementations and tradeoffs in engineering solutions
Strong written communication skills for explaining system behavior, failures, and engineering decisions
Engagement Details
Flexible hours with a minimum commitment of 20+ hours per week
Project length 1–2 months, with potential to extend
Compensation up to $150/task
Who will thrive in this role?
People who are most successful and satisfied in this role typically:
Enjoy working on real ML systems, not just clean, well-specified problems
Like debugging failures and understanding why systems behave the way they do
Are comfortable making engineering tradeoffs under ambiguity
Want exposure to cutting-edge ML and LLM systems, including evaluation and system-level behavior
Are looking for a high-quality, technically deep side gig, not a full-time product engineering role
Enjoy contributing to applied AI research and collaborating with industry research labs