About the Role
You'll be Snorkel's primary technical voice in the open-source and research communities. The work spans three audiences: frontier AI research teams (post-training, RL environments, evals and benchmarks), enterprise ML and applied AI teams building specialized models on proprietary expertise, and the broader data-centric AI community.
You'll partner closely with our research, forward deployed research, and product teams to translate the methodology behind Snorkel's work into world-class technical content, open-source contributions, conference presence, and a thriving community of data-centric AI practitioners.
Success looks like: a strong Snorkel open-source presence, a steady cadence of high-signal technical writing and research artifacts, marquee presence at the conferences that matter (NeurIPS, ICML, ICLR, AI Engineer World's Fair), and an engaged community of researchers and practitioners who view Snorkel as the trusted authority on data development for modern AI.
Responsibilities
- Own Snorkel's external technical voice. Write methodology posts, technical deep-dives, and research-grade content on data development for frontier models.
- Lead Snorkel's open-source presence. Define the GTM approach, ship code, review PRs, recruit contributors, and keep the libraries credible and current. Build OSS that demonstrates Snorkel's methodology in practice, including reproducible evals and benchmark artifacts.
- Advance the conversation on AI evaluation and benchmarking. Publish original work on how to measure agentic AI systems. Domain-specific evals, agent evals, LLM-as-judge calibration, contamination and saturation, and the connection between evals and post-training data.
- Drive conference and research community presence. Land talks, papers, and workshops at NeurIPS, ICML, ICLR, AI Engineer World's Fair, and the right practitioner venues. Build relationships with academic labs and AI research teams.
- Partner with the research team. Translate what's learned in research collaborations into externally shareable methodology, case studies, and tooling.
- Set the bar for technical credibility. Design evals and benchmarks, prototype RL environments, and write code worth using. Your authority comes from doing the work, not just talking about it.
Preferred Qualifications
- Experience. 6+ years in applied ML research, AI engineering, developer/research advocacy, or a research-intensive technical role with significant public output. Prior DevRel/advocate experience welcome but not required.
- Deep technical fluency in modern AI. Post-training techniques (RLHF, DPO, RLAIF), evaluation methodologies, RL environment design, training data pipelines, synthetic data generation, and at least one applied domain (coding agents, reasoning, multimodal, agents).
- Hands-on experience with AI evaluation and benchmarks. You've built and run real evals: public benchmarks (MMLU, GPQA, SWE-bench, HELM, BIG-bench, Arena-style head-to-heads), domain-specific custom evals, and LLM-as-judge pipelines with proper calibration.
- You build with AI, not just about AI. A power user of frontier coding agents (Claude Code, Cursor, Codex, and the like) in your day-to-day workflow, and you've built non-trivial agentic systems yourself – multi-step, tool-using, with real evals and an opinion on what breaks.
- A real public body of work. Talks, papers, blog posts, podcasts, and/or OSS contributions you can point to. Quality and signal matter more than volume.
- Customer- and researcher-facing presence. Comfortable in a room with frontier-lab research leads or a F500 ML team; can read the room and hold technical credibility on either side.
- Self-directed and comfortable with ambiguity. You ship without being asked, set your own quality bar, and enjoy moving at the pace of frontier AI.
Bonus: Advanced degree or sustained research output in ML/AI; prior experience at an AI lab, OSS-first AI company, or a research-driven technical org; conference program-committee or organizing experience; published or maintained a public benchmark; relationships in the post-training, evals, or RL-environments communities.