Data Scientist

Focus GTS • Contract • Remote (Washington DC-Baltimore Area, US) • $90k - $100k / year • 4d ago

The ideal candidate is a Data Scientist with 1-4 years of experience and a Computer Science or Statistics Degree. This individual will:

~Oversee day to day ops of AWS, Microsoft 365, multi auth work with "duo" (a tool)

~Write models to find key document sets (often times in millions of records of text data)

~Need strong STATS capabilities (core math capabilities)

Key ResponsibilitiesDevelop Python-based pipelines for extracting and processing text and metadata from documents, including both native text and image-based content.
Design and implement AI workflows using open-source and commercial large language models for classification, summarization, extraction, and analysis tasks.
Build and maintain vector indexes and retrieval-augmented generation (RAG) workflows to support document-heavy legal use cases.
Implement prompt templates and prompt design patterns to support consistency and reuse across client matters.
Deploy, operate, and support AI workflows in AWS environments, including use of SageMaker for model training, experimentation, and inference.
Apply traditional machine learning techniques (e.g., logistic regression, random forest, decision trees) where appropriate alongside LLM-based approaches.
Support statistical validation efforts, including sampling, metric calculation, and basic error analysis to evaluate model performance.
Work with SQL and MySQL to support data analysis, validation, and pipeline integration.
Produce clear, detailed documentation describing data sources, model behavior, validation results, and assumptions to support transparency and review.
Required QualificationsExperience in applied machine learning, NLP, and data engineering.
Strong Python proficiency with experience building data processing or ML pipelines.
Experience extracting and processing text from structured and unstructured documents, including images (e.g., OCR workflows).
Hands-on experience working with open-source and/or commercial large language models.
Experience deploying or supporting ML workflows in AWS, including SageMaker.
General competency with SQL and MySQL for data storage, querying and analysis.
Foundational understanding of statistical validation concepts, including sampling and performance metrics.
Experience with prompt templating and structured prompt design.
Experience building or working with vector indexes and RAG frameworks.
Working knowledge of classical machine learning models, including logistic regression.
Strong attention to detail and a disciplined approach to documentation.