Imagine what you could do here. At Apple, great ideas have a way of becoming great products, services, and customer experiences very quickly. Bring passion and dedication to your job and there’s no telling what you could accomplish.
Do you want to make Apple products more intelligent for our users? As part of Apple Services Engineering organization, Machine Learning Platform & Infrastructure team is building groundbreaking technology for search, natural language processing, artificial intelligence and machine learning. Our infrastructure is the back-bone of Apple Intelligence. It powers the largest Apple foundation models on servers and a wide gamut of services at Apple including Siri, Apple Music, AppleTV, AppStore, Photos & Camera, Spotlight, Safari, and upcoming ever exciting Apple products serving millions of queries every day with incredible low latencies, drawing every ounce of compute from our hardware.
As part of this group, you will work with one of the most exciting high performance computing environments, with petabytes of data, millions of queries per second, and have an opportunity to imagine and build products that delight our customers every single day. You will have a chance to work on optimizing billions of parameter language and vision and speech models using state of the art technologies and make it run at scale of Apple.
Description
We are seeking a Principal Engineer to provide leadership in building and evolving next-generation AI infrastructure for search and other product needs at Apple. In this role, you will shape the architecture and long-term technical strategy for large-scale inference systems that handle both internal workload and production traffic, integrate and evolve the web-scale search systems, work at the intersection of product innovation, AI research, and large scale distributed systems.
We design, build and maintain infrastructure to support features that empower billions of Apple users. We take full end-to-end ownership of our services, driving them through every stage meticulously, encompassing conception, design, implementation, deployment, and maintenance. As a result, each one of us takes our responsibilities seriously. In this team, you’ll have the opportunity to work on incredibly complex large scale systems with trillions of records and petabytes of data, work along side teams to optimize inference for cutting edge model architectures, and build production grade solutions for millions of customers in real time.
Minimum Qualifications
Bachelor’s degree in Computer Science, relevant technical field, or equivalent practical experience
Strong background in computer science: algorithms, data structures and system design
15+ year experience on large scale distributed system design, operation and optimization with over 10 years of leading teams
Has managed work across a large organization, demonstrated the ability to develop strong leaders, with a consistent track record of executional excellence
Excellent collaboration skills, excelling at both high-level thinking & execution as well as in the ability to influence and inspire others to achieve a common goal
Preferred Qualifications
Preferred qualifications
Master’s degree or PhD in Computer Science or related technical fields
Experience supporting distributed training inference workloads in production, ML systems performance profiling, debugging, and optimization
Proficiency in cloud-native architectures and orchestration platforms (e.g., Kubernetes)
Familiar with fundamental Deep Learning architectures such as Transformers, Encoder/Decoder models
Familiarity with Nvidia TensorRT-LLM, vLLLM, DeepSpeed, Nvidia Triton Server etc
Hands-on experience working with ML accelerators such as GPUs and TPUs