Artificial Intelligence Engineer

Griot, Inc. • Full-time • New York City Metropolitan Area, US • 3m ago

About Us

We are an early-stage team building next-generation speech intelligence systems for underserved, oral-first language environments.

We are looking for a Founding AI Engineer who wants to build foundational systems from the ground up — not iterate on incremental features, but help define the architecture, training strategy, and production stack behind a new class of speech-native AI.

The Role

As our Founding AI Engineer, you will own the core speech modeling architecture and deployment strategy.

This is not a research-only position. This is end-to-end system ownership.

You will design, train, evaluate, and deploy models — and ensure they are production-ready, scalable, and appropriate for real-world use.

Key Responsibilities

Design, train, and evaluate text-to-speech models from scratch.
Build and optimize GPU-based training pipelines (single and multi-GPU).
Work directly with raw speech/audio datasets, including low-resource environments.
Improve pronunciation accuracy, tone, prosody, and perceptual naturalness.
Experiment with self-supervised acoustic encoders (HuBERT / wav2vec2-class models).
Explore discrete acoustic tokenization and speech-native representation learning.
Contribute to early-stage sound-to-sound architecture exploration.
Deploy trained models into production infrastructure.
Optimize models for inference latency, cost efficiency, and on-device execution.
Design thoughtful evaluation frameworks (MOS, perceptual scoring, robustness metrics).
Architect the broader system around the model — ensuring reliability, scalability, and maintainability.

Qualifications

Required

3+ years of experience in Machine Learning.
Strong proficiency in PyTorch.
Experience training models on GPU infrastructure.
Experience deploying ML systems into production.
Strong system design instincts and architectural judgment.
Ability to make pragmatic engineering tradeoffs between research ambition and production reality.
Ability to build robust, production-ready systems — not just research prototypes.
Comfort operating independently in early-stage environments.

Preferred

Experience with speech models (ASR, TTS, Speech Language Models).
Experience with self-supervised learning.
Experience with diffusion or flow-matching generative models.
Experience with discrete acoustic token pipelines.
Experience with ONNX, quantization, or model optimization.
Experience working with low-resource datasets.
A PhD is not required.
Big Tech background is not required.
Ownership, judgment, and execution matter most.

Why Join

Foundational technical ownership from day one.
Meaningful early-stage equity.
Direct collaboration with founders.
Real-world production deployment.
Opportunity to help shape both the model and the system around it.

Application Process

Please send:

Your GitHub.
A short note outlining your ML experience.
Links to any relevant speech or audio-related projects.
We are reviewing candidates on a rolling basis and moving thoughtfully but quickly.