About the role
The Domain Scaling team has the goal to make Claude world-class at real-world knowledge work in domains like finance, healthcare, and legal. This is a unique role that combines executing directly on applied research and data sourcing (real-world and synthetic) to improve our models. You'll own the end-to-end process of creating RL environments for new capabilities: identifying high-value tasks, designing reward signals, managing vendor relationships, and measuring impact on model performance.
Responsibilities
Own the data strategy for knowledge work verticals end-to-end, from task sourcing through RL training
Manage technical relationships with external data vendors, including evaluation of data quality and reward design
Collaborate with domain experts to design data pipelines and evaluations
Explore novel ways of creating RL envs for high value tasks
Develop and improve QA frameworks to catch reward hacking and ensure env quality
Run generalization experiments to measure how data strategy changes improve model capabilities
Partner with other RL research teams and product teams to translate capability goals into training envs and evals
You may be a good fit if you
Have experience with fine-tuning large language models for specific domains or real-world use cases
Have experience with reinforcement learning, reward design, or training data curation for LLMs
Are comfortable managing technical vendor relationships and iterating quickly on feedback
Find value in reading through datasets to understand them and spot issues
Have strong cross-functional collaboration skills
Are passionate about making AI more useful and accessible across different industries
Are excited about a role that includes a combination of applied research and hands-on data work
Strong candidates may also
Have experience training production ML systems
Have experience designing evals or benchmarks for LLMs
Have domain expertise in a vertical where we would like to make our models more useful
Have experience working with external vendors or technical partners