AI Researcher / ML Engineer (ASR & Speech Specialist)

LILT • Full-time • Remote (Washington, District of Columbia, United States) • 1d ago

About LILT

AI is changing how the world communicates — and LILT is leading that transformation.

We're on a mission to make the world's information accessible to everyone, regardless of the language they speak. We use cutting-edge AI, machine translation, and human-in-the-loop expertise to translate content faster, more accurately, and more cost-effectively without compromising on brand, voice, or quality.

At LILT, we empower our teammates with leading tools, global collaboration, and growth opportunities to do their best work. Our company virtues—Work together, win together; Find a way or make one; Quicker than they expect; Quality is Job 1—guide everything we do. We are trusted by Intel Corporation, Canva, the United States Department of Defense, the United States Air Force, ASICS, and hundreds of global Enterprises. Backed by Sequoia, Intel Capital, and Redpoint, we’re building a category-defining company in a $50B+ global translation market being redefined by AI.

Role Summary

We are seeking a highly skilled and visionary Senior AI Researcher / Machine Learning Engineer specializing in Automatic Speech Recognition (ASR) to anchor our core speech intelligence and benchmarking initiatives. In this role, you will serve as our principal subject matter expert in AI speech data processing, responsible for architecting, training, and scaling high-performance, multilingual ASR models, as well as developing rigorous quality benchmarks for agentic conversational AI.

A critical component of this position involves developing robust domain-adaptation frameworks that allow our models to dynamically incorporate proprietary customer terminology, specialized industry jargon, and multilingual nuances. You will collaborate with the Engineering, Product, and AI Research teams to transform state-of-the-art speech research into production-ready systems powering on-device real-time streaming translation and novel frontier model benchmarks.

Key Challenge: Scaling ASR models capable of dynamic vocabulary insertion for enterprise-grade, ultra-low-latency, real-time environments, and end-to-end agentic AI benchmarking that goes beyond surface metrics.

Key Responsibilities

Model Development & Innovation: Architect, train, fine-tune, and evaluate state-of-the-art speech representations and ASR models (e.g., End-to-End Conformer, Whisper, RNN-T, and hybrid CTC/Attention architectures) across multiple global languages.
Customization & Domain Adaptation: Design and deploy highly scalable algorithms for dynamic vocabulary insertion, contextual biasing, and language model (LM) personalization to precisely capture customer-specific terminology, acronyms, and product names.
Evaluation: Implement automated framework evaluations to benchmark model performance, rigorously tracking Word Error Rate (WER), Character Error Rate (CER), embedding-based metrics, latency budgets (RTF), and computing efficiency profiles under varying acoustic environments.
Agentic Benchmarking: Develop pioneering multilingual benchmarks for end-to-end conversational AI agents, including speech-to-text and text-to-speech components, and targeting the weaknesses of state-of-the-art frontier models.
Real-Time & Batch Speech Systems: Partner with core engineering teams to build, optimize, and maintain high-throughput pipelines optimized for both ultra-low latency real-time streaming inference and high-efficiency asynchronous (batch) multi-channel speech analysis.
Speech Pipeline Engineering: Develop and refine standard auxiliary components of the speech processing chain, including Voice Activity Detection (VAD), speaker diarization, punctuation restoration, noise/acoustic normalization, and audio pre-processing filters.
Cross-Functional Productization: Translate product requirements into technical AI roadmaps, working hand-in-hand with Product Managers to ship speech-to-text, simultaneous translation, and semantic speech analytics features.

Required Technical Qualifications

Education: Master’s or Ph.D. degree in Computer Science, Electrical Engineering, Computational Linguistics, Data Science, or a related quantitative field with an emphasis on speech processing or deep learning (or equivalent proven industry track record).
Speech Domain Expertise: Minimum of 3–5 years of dedicated professional experience developing ASR systems, speech-to-text translation pipelines, or advanced audio processing models.
Deep Learning Frameworks: Advanced proficiency with PyTorch or equivalent frameworks, along with extensive experience utilizing dedicated speech toolkits such as Whisper, NVIDIA NeMo, Hugging Face Transformers, Kaldi, ESPnet, or SpeechBrain.
On-device runtimes: Hands-on experience converting and running PyTorch models on at least one mobile inference runtime: ExecuTorch, LiteRT (formerly TensorFlow Lite), or ONNX Runtime Mobile. You have personally taken a non-trivial model through conversion, including resolving unsupported operations and dynamic-shape or decoder-loop issues.
Software & Infrastructure: Strong software engineering principles in Python, with a clear understanding of data structures, algorithm optimization, and handling complex multilingual text/audio tokenization schemas.
Data Pipeline Mastery: Proven experience working with large-scale audio datasets, audio augmentation techniques (e.g., SpecAugment, noise injection), and text normalization/inverse text normalization (ITN) pipelines.

Preferred & Specialization Qualifications

High-Performance and on-device Inference: Experience optimizing models for constrained on-device and production environments using quantization (INT4/INT8/FP16), distillation, ONNX Runtime, TensorRT, or Triton Inference Server.
Research Footprint: Peer-reviewed publications in premier speech and machine learning conferences (e.g., ICASSP, INTERSPEECH, NeurIPS, ICLR, ACL) are a strong plus, or an active contribution footprint to open-source speech communities.
Hardware acceleration: Working knowledge of mobile NPU/DSP acceleration on the Android SoC landscape (Qualcomm QNN / Hexagon, GPU, and NNAPI delegates) and the trade-offs across Snapdragon, MediaTek, and Google Tensor.
Streaming Architectures: Deep technical familiarity with streaming neural architectures (e.g., block-processing, streaming transformers, or transducer models) and real-time network transport constraints (WebSockets, gRPC).
Multilingual Engineering: Professional exposure to building zero-shot multilingual speech systems or managing cross-lingual acoustic phonology data.

Core Competencies & Soft Skills

Analytical Problem Solving: Ability to break down ambiguous business or product requirements into deterministic, actionable machine learning experimentation frameworks.
Collaborative Communication: Strong capability to communicate intricate technical machine learning complexities to non-technical stakeholders across product, design, and executive leadership.
Ownership Mindset: Comfortable working in a fast-paced environment, taking accountability from initial algorithmic hypothesis and exploratory research through to final production monitoring.

Our Story

Our founders, Spence and John met at Google working on Google Translate. As researchers at Stanford and Berkeley, they both worked on language technology to make information accessible to everyone. While together at Google, they were amazed to learn that Google Translate wasn’t used for enterprise products and services inside the company.The quality just wasn’t there. So they set out to build something better. LILT was born.

LILT has been a machine learning company since its founding in 2015. At the time, machine translation didn’t meet the quality standard for enterprise translations, so LILT assembled a cutting-edge research team tasked with closing that gap. While meeting customer demand for translation services, LILT has prioritized investments in Large Language Models, human-in-the-loop systems, and now agentic AI.

With AI innovation accelerating and enterprise demand growing, the next phase of LILT’s journey is just beginning.

Our Tech

What sets our platform apart:

Brand-aware AI that learns your voice, tone, and terminology to ensure every translation is accurate and consistent
Agentic AI workflows that automate the entire translation process from content ingestion to quality review to publishing
100+ native integrations with systems like Adobe Experience Manager, Webflow, Salesforce, GitHub, and Google Drive to simplify content translation
Human-in-the-loop reviews via our global network of professional linguists, for high-impact content that requires expert review

LILT in the News

Featured in The Software Report’s Top 100 Software Companies!
LILT makes it onto the Inc. 5000 List.
LILT’s continues to be an intellectual powerhouse, holding numerous patents that help power the most efficient and sophisticated AI and language models in the industry.
Check out all our news on our website.

Information collected and processed as part of your application process, including any job applications you choose to submit, is subject to LILT's Privacy Policy at https://lilt.com/legal/privacy.

At LILT, we are committed to a fair, inclusive, and transparent hiring process. As part of our recruitment efforts, we may use artificial intelligence (AI) and automated tools to assist in the evaluation of applications, including résumé screening, assessment scoring, and interview analysis. These tools are designed to support human decision-making and help us identify qualified candidates efficiently and objectively. All final hiring decisions are made by people. If you have any concerns, require accommodations, or would like to opt-out of the use of AI in our hiring process, please let us know at recruiting@lilt.com.

LILT is an equal opportunity employer. We extend equal opportunity to all individuals without regard to an individual’s race, religion, color, national origin, ancestry, sex, sexual orientation, gender identity, age, physical or mental disability, medical condition, genetic characteristics, veteran or marital status, pregnancy, or any other classification protected by applicable local, state or federal laws. We are committed to the principles of fair employment and the elimination of all discriminatory practices.