About Us
We are an early-stage team building next-generation speech intelligence systems for underserved, oral-first language environments.
We are looking for a Founding AI Engineer who wants to build foundational systems from the ground up — not iterate on incremental features, but help define the architecture, training strategy, and production stack behind a new class of speech-native AI.
The Role
As our Founding AI Engineer, you will own the core speech modeling architecture and deployment strategy.
This is not a research-only position. This is end-to-end system ownership.
You will design, train, evaluate, and deploy models — and ensure they are production-ready, scalable, and appropriate for real-world use.
Key Responsibilities
- Design, train, and evaluate text-to-speech models from scratch.
- Build and optimize GPU-based training pipelines (single and multi-GPU).
- Work directly with raw speech/audio datasets, including low-resource environments.
- Improve pronunciation accuracy, tone, prosody, and perceptual naturalness.
- Experiment with self-supervised acoustic encoders (HuBERT / wav2vec2-class models).
- Explore discrete acoustic tokenization and speech-native representation learning.
- Contribute to early-stage sound-to-sound architecture exploration.
- Deploy trained models into production infrastructure.
- Optimize models for inference latency, cost efficiency, and on-device execution.
- Design thoughtful evaluation frameworks (MOS, perceptual scoring, robustness metrics).
- Architect the broader system around the model — ensuring reliability, scalability, and maintainability.
Qualifications
Required
- 3+ years of experience in Machine Learning.
- Strong proficiency in PyTorch.
- Experience training models on GPU infrastructure.
- Experience deploying ML systems into production.
- Strong system design instincts and architectural judgment.
- Ability to make pragmatic engineering tradeoffs between research ambition and production reality.
- Ability to build robust, production-ready systems — not just research prototypes.
- Comfort operating independently in early-stage environments.
Preferred
- Experience with speech models (ASR, TTS, Speech Language Models).
- Experience with self-supervised learning.
- Experience with diffusion or flow-matching generative models.
- Experience with discrete acoustic token pipelines.
- Experience with ONNX, quantization, or model optimization.
- Experience working with low-resource datasets.
- A PhD is not required.
- Big Tech background is not required.
- Ownership, judgment, and execution matter most.
Why Join
- Foundational technical ownership from day one.
- Meaningful early-stage equity.
- Direct collaboration with founders.
- Real-world production deployment.
- Opportunity to help shape both the model and the system around it.
Application Process
Please send:
- Your GitHub.
- A short note outlining your ML experience.
- Links to any relevant speech or audio-related projects.
- We are reviewing candidates on a rolling basis and moving thoughtfully but quickly.