Position description
At Sonar, we are seeking an innovative Machine Learning Scientist to join our Data & AI team and pioneer the next generation of our code analysis engine. You will be at the forefront of applying cutting-edge AI and Large Language Model (LLM) techniques to the complex domain of source code. Your work will directly shape our products, pushing the boundaries of static analysis to help millions of developers write better, more secure code. If you are driven to solve real-world problems by turning state-of-the-art research into practical, high-impact solutions, this is the role for you.
\n
What you will do
- Spearhead Research & Innovation: Stay on the cutting edge of ML, Deep Learning, and LLMs, specifically their application to the Software Development Lifecycle (SDLC), and identify novel opportunities to enhance our products.
- Develop Advanced AI Models: Design, prototype, and validate novel ML models that identify and resolve complex bugs, vulnerabilities, and code smells, going beyond the capabilities of traditional static analysis.
- Build LLM-Powered Features: Develop and implement advanced LLM-based solutions, including Retrieval-Augmented Generation (RAG) for contextual code analysis, fine-tuning models on proprietary codebases, and exploring agentic systems for automated code remediation.
- Engineer Data Pipelines: Build and manage robust data pipelines to gather, process, and version massive code-centric datasets required for training and evaluating specialized models at scale.
- Translate Prototypes to Products: Collaborate closely with engineering and product teams to integrate successful ML prototypes into Sonar's cutting-edge products, ensuring they meet the needs of our global user base.
- Communicate and Evangelize: Clearly articulate and document complex technical concepts and research findings to both technical and non-technical stakeholders.
Experience and qualifications
- An advanced academic background (Master’s or PhD) in Computer Science, Machine Learning, or a related quantitative field.
- Strong industry experience in machine learning, with a solid understanding of modern software engineering practices and tools.
- Solid programming skills in Python and hands-on experience with core ML/DL frameworks (e.g., PyTorch, TensorFlow, Hugging Face). Familiarity with Java is a plus.
- Proven experience in applied Machine Learning, with a strong focus on Natural Language Processing (NLP) or, ideally, Programming Language Processing (PLP).
- Hands-on experience with modern LLM architectures and techniques, such as Fine-tuning strategies (e.g., LoRA, QLoRA), advanced prompt engineering, building and optimizing Retrieval-Augmented Generation (RAG) pipelines and working with vector databases and semantic search
- Experience with large-scale data processing frameworks and cloud infrastructure (e.g. AWS).
- Experience of driving research projects from initial ideation to a demonstrable prototype with a high degree of autonomy.
- Excellent communication skills in English and a talent for explaining complex scientific topics clearly and concisely.
\n
Additional comments
This role is based in Bochum. We are unable to consider candidates unwilling to be in Bochum, but we are willing to relocate the right candidate.