About Hark
Hark is an artificial intelligence company building advanced, personalized intelligence. One that is proactive, multimodal, and capable of interacting with the world through speech, text, vision, and persistent memory.
We're pairing that intelligence with next-generation hardware to create a universal interface between humans and machines. While today's AI largely operates through chat boxes and decade-old devices, Hark is focused on what comes next: agentic systems that interact naturally with people and the real world.
To get there, we're developing multimodal models and next-generation AI hardware together - designed from the ground up as a single, unified interface for a new era of intelligent systems.
We are seeking a Member of Technical Staff, Infrastructure Speech to lead and scale the backbone of Hark's real-time speech-to-speech engine. Positioned at the nexus of systems engineering and speech AI, you will be accountable for the reliability, latency, and performance of the infrastructure supporting our live speech models. This is a high-impact technical role designed for someone who excels in low-latency distributed environments and approaches infrastructure with a product-driven mindset.
Responsibilities
- Facilitate the repeatable, auditable, and scalable provisioning of our speech inference stack.
- Harden CI/CD pipelines to guarantee the secure, ultra-low-latency deployment of real-time speech services across all production environments.
- Lead the evolution of the end-to-end infrastructure powering Hark's speech-to-speech models, including streaming pipelines, session management, and fault tolerance.
- Collaborate with speech ML researchers to identify latency bottlenecks and translate complex requirements into robust infrastructure enhancements.
- Oversee system health and incident response, defining critical SLOs for real-time speech workloads where performance and uptime are paramount.
- Manage capacity planning, cost efficiency, and the hardware lifecycle for the global speech inference fleet.
- Build internal tooling and platform abstractions to streamline the developer experience for teams operating on speech infrastructure.
Requirements
- Possess 5+ years of expertise in infrastructure, systems, or platform engineering, including a minimum of 2 years dedicated to real-time or low-latency environments.
- Demonstrated success in deploying and managing large-scale inference frameworks or streaming infrastructure at scale.
- Showcase advanced proficiency in at least one systems-level or infrastructure-centric programming language.
- Maintain a deep technical understanding of networking fundamentals critical to real-time audio and low-latency inference, such as WebRTC or gRPC.
- Proven experience with container orchestration, sophisticated job scheduling, and multi-tenant resource management.
- Track record of technical ownership over production systems where high reliability and rigorous latency constraints are the standard.
- Exhibit robust debugging and observability capabilities across the entirety of the infrastructure stack.
Bonus Qualifications
- Deep expertise in Kubernetes (K8s), with a focus on GPU-aware orchestration and the management of latency-sensitive workloads.
- Proficiency in Pulumi or comparable modern Infrastructure as Code (IaC) frameworks.
- Advanced command of Rust or Go for developing systems-level tooling and performance-critical services.
- Technical familiarity with speech model architectures—including ASR, TTS, and end-to-end speech-to-speech—and their unique inference characteristics.
- Hands-on experience with streaming data pipelines and transport layers, such as Kafka, WebSockets, or custom audio protocols.
Compensation
The US base salary range for this full-time position is between $180,000 - $450,000 annually.
The pay offered for this position may vary based on several individual factors, including job-related knowledge, skills, and experience. The total compensation package may also include additional components/benefits depending on the specific role. This information will be shared if an employment offer is extended.