The ideal candidate is a Data Scientist with 1-4 years of experience and a Computer Science or Statistics Degree. This individual will:
~Oversee day to day ops of AWS, Microsoft 365, multi auth work with "duo" (a tool)
~Write models to find key document sets (often times in millions of records of text data)
~Need strong STATS capabilities (core math capabilities)
- Key ResponsibilitiesDevelop Python-based pipelines for extracting and processing text and metadata from documents, including both native text and image-based content.
- Design and implement AI workflows using open-source and commercial large language models for classification, summarization, extraction, and analysis tasks.
- Build and maintain vector indexes and retrieval-augmented generation (RAG) workflows to support document-heavy legal use cases.
- Implement prompt templates and prompt design patterns to support consistency and reuse across client matters.
- Deploy, operate, and support AI workflows in AWS environments, including use of SageMaker for model training, experimentation, and inference.
- Apply traditional machine learning techniques (e.g., logistic regression, random forest, decision trees) where appropriate alongside LLM-based approaches.
- Support statistical validation efforts, including sampling, metric calculation, and basic error analysis to evaluate model performance.
- Work with SQL and MySQL to support data analysis, validation, and pipeline integration.
- Produce clear, detailed documentation describing data sources, model behavior, validation results, and assumptions to support transparency and review.
- Required QualificationsExperience in applied machine learning, NLP, and data engineering.
- Strong Python proficiency with experience building data processing or ML pipelines.
- Experience extracting and processing text from structured and unstructured documents, including images (e.g., OCR workflows).
- Hands-on experience working with open-source and/or commercial large language models.
- Experience deploying or supporting ML workflows in AWS, including SageMaker.
- General competency with SQL and MySQL for data storage, querying and analysis.
- Foundational understanding of statistical validation concepts, including sampling and performance metrics.
- Experience with prompt templating and structured prompt design.
- Experience building or working with vector indexes and RAG frameworks.
- Working knowledge of classical machine learning models, including logistic regression.
- Strong attention to detail and a disciplined approach to documentation.