About The Role
As a Network Architect on the Cluster Architecture Team, you will work closely with the vendors, internal networking teams and industry peers to develop best-in-class interconnect architecture of the current and future generations of the Cerebras AI clusters. You will be responsible for developing proof-of-concept of new network designs and features enabling resilient and reliable network for AI workloads. The role will require cross-functional collaboration and interaction with diverse hardware components (e.g., network devices and the Wafer-Scale Engine) as well as software at several layers of the stack, from host-side networking to cluster-level coordination. The role also requires understanding of network monitoring systems and network debugging methodologies.
Responsibilities
- Design AI/ML and HPC Clusters.
- Identify and address performance or efficiency bottlenecks, ensuring high resource utilization, low latency, and high throughput communication.
- Drive technical projects involving multiple teams, various software and hardware components coming together to realize advanced Networking technologies.
- Bring effective communication skills.
- Collaborate with vendors and industry peers to drive network hardware and feature roadmap.
- Represent Cerebras in industry forums.
- Central point of contact for any network reliability issues.
Skills & Qualifications
- Ph.D. in Computer Science or Electrical Engineering + 10 years industry experience or Master’s in CS or EE + 15 years industry experience.
- 8+ Years of experience in large scale network designs in WAN or Datacenter.
- Extensive experience debugging networking issues in large distributed systems environment with multiple networking platforms and protocols.
- Experience of managing and leading multi-phase and multi-team projects.
- Networking platforms like Juniper, Arista, Cisco, Open box architectures (Sonic, FOBSS).
- Networking protocols like RoCE, BGP, DCQCN, PFC, Streaming telemetry.
- Familiarity with automation languages like Python, or Go.
- Familiarity with Network visibility and management systems.