About The Role
G2 is looking for a Senior Full-Stack AI Engineer with strong production experience across modern web/backend systems and hands-on exposure to LLMs, Voice AI, and AI data platforms. You’ll lead end-to-end execution across the stack—designing reliable services, building real-time conversational experiences, and owning the data and evaluation foundations that turn large volumes of interview interactions into structured insights and continuously improving models.
This role is ideal for someone who can balance product velocity with engineering rigor, and who enjoys working across voice pipelines, retrieval/agent workflows, and data + evaluation systems to deliver measurable quality improvements over time.
In This Role, You Will:
- LMM/Agent Development: Prompting, RAG & Evaluation
- Lead prompt design and iteration for summarisation, decision-making, multi-turn dialogue, agent behaviours, and tool/function calling.
- Build and maintain evaluation harnesses (golden sets, rubrics, regression suites) to measure accuracy, consistency, safety, and usefulness across releases.
- Implement and optimize RAG (Retrieval-Augmented Generation) workflows: chunking strategies, embeddings, retrieval/reranking, citations, and grounding techniques to reduce hallucinations.
- Define strategies for knowledge freshness and context management across a project’s lifecycle (e.g., project-specific knowledge bases, interview-derived artifacts, evolving taxonomies).
Voice AI: Real-Time Conversational Systems
- Integrate and optimize components in AI-powered voice pipelines (STT, NLU, TTS, turn-taking, barge-in/interrupt handling, session state).
- Improve multi-turn voice experience quality: latency, timing alignment, disfluency handling, and context retention.
- Build voice simulation and test tooling to validate real-world and adversarial scenarios (noise, accents, interruptions, partial transcripts).
- Partner with ML/Voice specialists to diagnose ASR misfires, timing mismatches, and agent/voice orchestration issues.
AI Data Platforms: ETL/ELT, Information Extraction & Reporting Datasets
- Design ingestion and transformation workflows for high-volume interview data (audio, transcripts, free-text responses, metadata, annotations).
- Build ETL/ELT pipelines that validate/normalize inputs, run information extraction (entities, themes, taxonomy labeling, key moments), and produce curated, queryable reporting datasets.
- Establish data models and schemas that preserve lineage from raw sources → intermediate artifacts → curated outputs → report-ready datasets.
- Implement data quality practices: completeness/validity checks, sampling-based verification, reconciliation, and monitoring for drift.
- Build mechanisms for traceability and auditability (e.g., linking report outputs back to transcript spans/timecodes, retrieval sources, and model/prompt versions).
Continuous Improvement: Fine-Tuning, Adaptation & “Learning” Over a Project
- Collaborate with ML/Data teams to support fine-tuning and/or model adaptation workflows (dataset curation, labeling guidelines, training/eval splits, offline evaluation, rollout validation).
- Implement project-level feedback loops so the system improves as more interviews occur:
- maintain evolving taxonomies and question strategies,
- incorporate newly discovered concepts into retrieval stores,
- update prompts/policies based on failure patterns,
- expand evaluation sets automatically with new edge cases.
- Build mechanisms for “real-time” or iterative learning without sacrificing safety (e.g., controlled updates to RAG indexes, prompt/version rollouts, gated releases, human review where needed).
- Enable the agent to ask more intelligent follow-up questions by using accumulated project knowledge (grounded in retrieved evidence and governed by safety policies).
Backend Engineering: Architecture, Reliability & Observability
- Own architecture and implementation of backend services and workflows supporting LLM/voice/data experiences (APIs, orchestration, storage, queues).
- Improve system resilience through observability, tracing, structured logging, rate limiting, fallbacks, and failure-mode design.
- Lead debugging and resolution of complex issues across LLM pipelines, retrieval systems, data workflows, and conversational agent logic.
- Build internal tools to accelerate diagnosis, QA, and safe experimentation.
Automated Testing, Security & Quality Engineering
- Design and maintain automated test suites for APIs, pipelines, RAG systems, and LLM outputs (regression, reliability, performance, load).
- Use LLMs to generate synthetic datasets for robust coverage across realistic and adversarial conditions.
- Establish quality gates in CI/CD (eval thresholds, golden tests, contract tests) to ensure safe deployments.
- Proactively identify and mitigate threats such as prompt injection, data leakage, and abuse scenarios.
Technical Leadership & Collaboration
- Lead projects end-to-end: requirements shaping, technical design, implementation, rollout, monitoring, and iteration.
- Mentor engineers through code reviews, pairing, design guidance, and raising engineering standards.
- Communicate tradeoffs clearly with stakeholders; influence roadmap decisions through technical insight.
- Contribute to documentation, runbooks, and best practices for production-grade AI systems.
Minimum Qualifications:
We realize applying for jobs can feel daunting at times. Even if you don’t check all the boxes in the job description, we encourage you to apply anyway.
- Required
- 5–8+ years of professional software engineering experience (full-stack, backend, platform, or data-adjacent systems).
- Hands-on experience with LLMs (e.g., OpenAI, Anthropic/Claude, Mistral, etc.), including prompt design and evaluation.
- Experience implementing or operating RAG systems (embeddings, retrieval, reranking, grounding/citations).
- Strong proficiency in Python and/or JavaScript/TypeScript with production experience.
- Experience designing and operating reliable services (APIs, background jobs, event-driven workflows).
- Experience with automated testing frameworks and CI/CD; strong engineering rigor and ownership mindset.
- Experience building ETL/ELT workflows, data transformations, and report-ready datasets from semi-structured/unstructured inputs.
- Familiarity with real-time voice systems, conversational agents, or low-latency interactive products.
Preferred
- FastAPI (or similar) for building and scaling REST services; excellent Python fundamentals.
- Next.js + React for rapid prototyping and UI validation of conversational experiences.
- Experience with observability stacks (metrics, tracing), performance tuning, and incident response practices.
- Experience supporting fine-tuning/model adaptation workflows (dataset curation, labeling, eval, rollout).
- Experience with security considerations for AI products (prompt injection defense, data leakage prevention, abuse monitoring).