About the Role
We are seeking a versatile and experienced engineer to join our Inference Core Model Bringup team. This team is responsible to rapidly bring up state-of-the-art open-source models (like LLaMA, Qwen, etc) or customer-provided proprietary models on our Cerebras CSX systems. Success in this role requires a system-minded generalist who thrives in fast-paced bringup environments and is comfortable working across the entire Cerebras software stack.
Your work will play a critical role in achieving unprecedented levels of performance, efficiency, and scalability for AI applications.
Responsibilities
- Contribute to the end-to-end bring up of ML models on Cerebras CSX systems.
- Work across the stack: model architecture translation, graph lowering, compiler optimizations, runtime integration, and performance tuning.
- Debug performance and correctness issues spanning model code, compiler IRs, runtime behavior, and hardware utilization.
- Propose and prototype improvements across tools, APIs, or automation flows to accelerate future bring ups.
Skills & Qualifications
- Bachelor’s, Master’s, or PhD in Computer Science, Engineering, or a related field.
- Comfort navigating the full AI toolchain: Python modeling code, compiler IRs, performance profiling, etc.
- Strong debugging skills across performance, numerical accuracy, and runtime integration.
- Experience with deep learning frameworks (e.g., PyTorch, TensorFlow) and familiarity with model internals (e.g., attention, MoE, diffusion).
- Proficiency in C/C++ programming and experience with low-level optimization.
- Proven experience in compiler development, particularly with LLVM and/or MLIR.
- Strong background in optimization techniques, particularly those involving NP-hard problems.
What We Offer
- Competitive salary and benefits package.
- Opportunities for professional growth and career advancement.
- A dynamic and innovative work environment.
- The chance to work on cutting-edge technologies and make a significant impact on the future of AI.