The Role
We are looking for a highly analytical and strategic thinker to take ownership of our model evaluation analysis and insight generation. Our analysis has established a high standard for deep-dive analysis of model evaluation. We need someone who can not only maintain this cadence but elevate it, turning raw result data into a roadmap for model improvement.
Responsibilities
- Own the creation of model evaluation from initial hypothesis, data scraping to final publication.
- Go beyond aggregate metrics (e.g., "Accuracy is 85%"). deeply analyze why the model failed on the other 15%. Identify semantic patterns, edge cases, and systemic hallucinations in raw model outputs.
- Review raw data sets, meeting transcripts, and research notes to identify the "so what?" We need to turn these findings into a logical hierarchy
- You will act as the bridge between the data and the narrative by structuring findings into a logical hierarchy where the most critical "hook" lands first, followed by the supporting evidence
Who You Are (Requirements)
- Experience: You have 5 - 10 years of experience in DS, ML, AI research and analysis
- Structured Thinker: You organize your writing logically.
- High Tolerance for Ambiguity: You can take a messy pile of notes and organize it into a coherent outline without needing your hand held.
- Executive Presence: You are comfortable interviewing senior leaders and pushing back when an "insight" isn't actually insightful.
- Cross Functionality: Be able to work cross functionally across ML researchers to clients.
Nice to Have
- Experience in Model Evaluation, ML Engineering or Technical Research.
- Experience designing or curating datasets (RLHF, SFT data)