About the Role
As a Member of Technical Staff [Platform] at NeoCognition, you’ll design and build the internal systems that power everything we do — from research experiments to product deployments.
You’ll create the tooling, infrastructure, and developer experience that enable our team to iterate rapidly, deploy confidently, and scale intelligently.
This role sits at the intersection of infrastructure engineering, developer tooling, and MLOps. You’ll collaborate closely with research scientists and software engineers to ensure that our data, model, and product workflows are robust, reproducible, and efficient.
Responsibilities
Design, build, and maintain the core infrastructure supporting our research and product environments (cloud compute, storage, CI/CD pipelines, observability, security).
Develop and manage internal developer tools, automation systems, and shared libraries that accelerate productivity across teams.
Build reliable systems for data management, model experimentation, evaluation, and deployment in both research and production contexts.
Define and evolve our build, test, and release workflows, ensuring seamless integration from prototype to production.
Implement and monitor observability and performance metrics across services and pipelines.
Collaborate closely with research and product teams to diagnose bottlenecks, optimize workflows, and ensure reproducibility of results.
Contribute to infrastructure strategy and architecture, helping shape how we scale as the company grows.
Qualifications
Required:
Strong software engineering background with experience in infrastructure, platform, or DevOps engineering.
Proficiency with cloud environments (AWS, GCP, or similar), containerization/orchestration (Docker, Kubernetes), and CI/CD systems (GitHub Actions, Terraform, etc.).
Experience building and maintaining developer tools, automation frameworks, or internal platforms.
Familiarity with data pipelines, job scheduling, or ML experimentation workflows.
Excellent problem-solving skills and a track record of improving developer velocity or system reliability.
Strong communication skills and ability to collaborate with both research and product engineering teams.
Nice to have:
Experience with machine learning infrastructure, training pipelines, or model evaluation tooling.
Background in monitoring and observability (e.g., Prometheus, Grafana, Datadog).
Knowledge of infrastructure as code and configuration management best practices.
Interest in designing systems that support reproducible, safe, and efficient AI development.
Prior experience in an early-stage startup or small research organization.