Role Overview
We are seeking an Associate Data Scientist to support AI/ML engineering efforts by preparing, validating, and structuring data for LLM-driven systems. This is a hands-on role focused on real-world data processing, pipeline support, and model evaluation.
Key Responsibilities
Process and clean structured and unstructured data for AI/ML pipelines.
Prepare training-ready datasets for LLM fine-tuning and evaluation workflows.
Support RAG and NL→SQL systems through data preparation and validation.
Perform data quality checks and ensure completeness and consistency.
Assist in building and maintaining data pipelines and APIs (e.g., FastAPI).
Collaborate with engineering teams to troubleshoot and optimize data workflows.
Required Skills
1–3 years of experience in data processing or data-focused roles.
Strong Python skills with experience in data libraries (Pandas, NumPy, Scikit-learn).
Experience supporting LLM workflows (fine-tuning, prompt engineering, evaluation).
Familiarity with structured (SQL) and unstructured text data.
Understanding of data preparation for AI/ML systems.
Nice to Have
Exposure to RAG pipelines, embeddings, or evaluation metrics.
Experience with ML frameworks (PyTorch/TensorFlow) and Docker-based workflows.
Experience with CI/CD pipelines for ML systems.
Familiarity with vector databases (e.g., Chroma) and reranking techniques.
Research exposure to Transformer-based architectures.