Director, Research - Evaluation & Training

Snorkel AI • San Francisco, CA (Hybrid) • 1d ago

Snorkel AI came out of the Stanford AI lab and is built on the principle that data development is the core of AI. For years, we have pioneered data-centric methods that power the world’s most sophisticated AI implementations from top enterprises to frontier AI labs. We are looking for a Senior Manager of Research to lead a team focused on data-centric research.

About the Role

We're looking for a manager to lead a team of researchers to focus on data evaluation, error analysis and data valuation methods to predict model performance. This team is responsible for showcasing the value and quality of Snorkel’s data for model training and evaluation, understanding where today's frontier models fall short, and turning that understanding into a point of view on what benchmarks and datasets these models will benefit from.

You and your team will be responsible for Snorkel’s data design flywheel by analyzing model failures, finding capability and skill gaps in current models, suggesting the next benchmarks to invest in and then proving the value of this data for our customers.

Main Responsibilities

Own a multi-quarter roadmap centered on novel evaluation, error analysis, and data valuation techniques
Synthesize and share trends from model-failure analysis and benchmarking into recommendations on the datasets the community should focus on and the ones Snorkel should invest in — making this team a primary input to the company's data strategy.
Focus on data valuation techniques that quantify how Snorkel data meaningfully improves model performance
Lead and grow a team of researchers, setting a high bar for quality, rigor and speed of execution
Act as the primary bridge between the team's findings and Product, GTM, and our customers

What We’re Looking For

7+ years in applied AI, ML, or research roles, with 4+ years managing technical teams.
A leader who has repeatedly turned research and analysis into business outcomes, and who instinctively connects technical findings to market and customer needs.
Strong business and market judgment in the AI/ML space — you understand the competitive and frontier-lab landscape and can prioritize accordingly.
Technically conversant and credible: enough depth in LLM evaluation, benchmarking, and model behavior analysis to set direction, judge experimental quality, and pressure-test results — without needing to be the deepest technical expert in the room.
A nose for trends: able to look across many evaluation results and failure cases and extract the signal that should drive what gets built next.
Excellent communication and storytelling skills, with the ability to make technical results legible and persuasive to non-research audiences.
Familiarity with data valuation or data attribution research is a strong plus.
Bonus: experience working with frontier labs, public benchmarks, or commercial AI data/eval products.