Job Description
Insight Global is seeking a team of experienced, driven Machine Learning Engineer to join an established health technology company sitting in San Jose, CA. This is a full-time, permanent role with competitive salary, bonus, and comprehensive benefits.
In this role you'll need:
Deep Learning Frameworks: Hands-on experience with PyTorch (main focus) and familiarity with TensorFlow.
Large-Scale Model Training: Exposure to advanced training techniques like Distributed Data Parallel (DDP), Fully Sharded Data Parallel (FSDP), ZeRO, and model parallelism (pipeline/tensor). Experience with distributed training is a strong plus.
Model Optimization: Skilled in improving model performance through techniques like quantization (PTQ, QAT, AWQ, GPTQ), pruning, knowledge distillation, KV-cache tuning, and using efficient attention mechanisms like Flash Attention.
Scalable Model Serving: Understanding of how to deploy models at scale, including autoscaling, load balancing, streaming, batching, and caching. Comfortable working alongside platform engineers to build robust serving pipelines.
Data & Storage Systems: Proficient with both SQL and NoSQL databases, vector databases (e.g., FAISS, Milvus, Pinecone, pgvector), and data formats like Parquet and Delta. Familiar with object storage systems.
Code Quality: Writes efficient, clean, and maintainable code with a focus on performance.
End-to-End ML Lifecycle: Solid grasp of the full machine learning workflow—from data collection and model training to deployment, inference, optimization, and evaluation.
Required Skills & Experience
•3–5 years in ML/AI engineering roles owning training and/or serving in production at scale.
•Demonstrated success delivering high-throughput, low-latency ML services with reliability and cost improvements.
•Experience collaborating across Research, Platform/Infra, Data, and Product functions.
•Bachelors in computer science, Electrical/Computer Engineering, or a related field required; Master’s preferred (or equivalent industry experience).
•Strong systems/ML engineering with exposure to distributed training and inference optimization.