AI Diagnostics & Observability Engineer
We’re building AI systems that learn from real-world usage. This role owns the systems that make that possible.
What You’ll Do
- Own the feedback, evaluation, and observability layer for production AI agents
- Build systems to detect, diagnose, and resolve failures at scale
- Design evaluation frameworks and quality metrics for real-world performance
- Develop tracing and debugging tooling for LLM-driven systems
- Turn human feedback into automated improvement loops
- Drive the architecture for self-improving AI systems
What We’re Looking For
- 5+ years in backend or systems engineering
- Experience with production AI / LLM systems
- Strong skills in Python (Java or Rust a plus)
- Experience with observability, evals, or reliability engineering
- Ability to debug complex, distributed systems and work from first principles
- Experience building internal platforms used by engineering teams
Nice to Have
- LLMOps / evaluation frameworks
- Feedback or human-in-the-loop systems
- Experience with conversational or voice AI
- Startup or AI-native environment experience
- Experience designing human-in-the-loop workflows
Work Model
- Hybrid (3 days a week in Palo Alto, California)