Be one of the founding engineers at Nen, shaping the AI layer that powers automation across enterprise desktop environments at scale.
The role
Build and extend a multi-model agent loop across leading AI providers
Benchmark models across cost, latency, and reliability — and own the framework for doing so continuously
Improve agent reliability through better perception, grounding, and structured context
Instrument traces and build the data foundation for future fine-tuning
Shape the Python workflow SDK so improvements are transparent to users
Requirements
Hands-on experience building with LLMs in production
Strong Python; comfortable working across SDK, API, and model integration layers
Experience evaluating and benchmarking models with structured evals, not just vibes
Familiarity with agent architectures, tool use, and multi-step reasoning loops
Curiosity about the computer-use model landscape and how it's evolving fast