ABOUT THE ROLE:
We are seeking an exceptional Manager, Operations to lead facilities operations and power generation for xAI’s hyperscale AI compute facilities. This role will own the day-to-day and long-term performance of mission-critical data center operations, including power generation, power distribution, cooling, mechanical, electrical, and environmental systems, while also directing the fiber teams responsible for high-capacity networking and connectivity that support our supercomputing clusters.
You will build and lead high-performing operations, power generation, and fiber teams, drive relentless reliability and efficiency, and ensure seamless 24/7 uptime for the infrastructure powering xAI’s AI training at unprecedented scale. This high-impact position requires deep expertise in data center or hyperscale operations (including power generation), strong leadership in fast-paced environments, and the ability to deliver world-class performance under aggressive growth timelines. This is a full-time, primarily onsite role with significant travel to sites and vendor locations.
RESPONSIBILITIES:
- Lead and scale the facilities operations and power generation teams responsible for the reliable operation, maintenance, monitoring, and optimization of critical infrastructure including on-site power generation assets, electrical systems, mechanical/HVAC, liquid cooling, power distribution, UPS, generators, and building management systems.
- Direct the fiber teams overseeing the design, deployment, maintenance, and expansion of high-speed fiber optic networks, dark fiber, and connectivity infrastructure supporting AI compute clusters and data center interconnects.
- Own key performance metrics such as uptime (targeting 99.999%+), mean time to detect/repair (MTTD/MTTR), power usage effectiveness (PUE), water usage effectiveness (WUE), power generation efficiency, and overall infrastructure availability.
- Develop and enforce standard operating procedures (SOPs), preventive maintenance programs, incident response protocols, and continuous improvement processes for both facilities and power generation assets to minimize downtime and maximize efficiency.
- Build, mentor, and grow multidisciplinary teams of operations technicians, power generation engineers and controls specialists while fostering a culture of ownership, safety, and excellence.
- Partner closely with engineering, construction, procurement, and AI hardware teams to support new facility builds, expansions, commissioning, power integration, and smooth handovers from project to operations.
- Manage operational budgets, vendor relationships (maintenance contractors, fiber providers, power generation OEMs, fuel suppliers), spare parts inventory, and risk mitigation strategies in a high-velocity environment.
- Drive innovation in operational practices, automation, predictive maintenance, power generation optimization, and sustainability initiatives to support the extreme power and cooling demands of next-generation AI systems.
- Provide regular performance reporting, root cause analyses, lessons learned, and strategic recommendations to senior leadership.
BASIC QUALIFICATIONS:
- 5+ years of progressive experience in data center facilities operations, power generation operations, hyperscale infrastructure management, or mission-critical industrial operations, with at least 2+ years in a management or supervisor role.
- Proven track record leading large-scale operations teams supporting high-density compute environments with significant on-site or dedicated power generation (AI, HPC, or hyperscaler data centers strongly preferred).
- Strong experience managing fiber optic networks, dark fiber deployments, or high-bandwidth connectivity infrastructure in large-scale technical environments.
- Deep knowledge of power generation systems (gas turbines, reciprocating engines, cogeneration, etc.), MEP (mechanical, electrical, plumbing) systems, BMS/SCADA, liquid cooling, power redundancy topologies, and 24/7 operations best practices.
- Demonstrated success delivering high reliability, rapid incident resolution, and operational excellence under aggressive scaling timelines.
- Hands-on leadership style with the ability to roll up sleeves while effectively managing teams, budgets, and cross-functional stakeholders.
- Proficiency with operations tools, CMMS (computerized maintenance management systems), monitoring platforms, and data-driven decision making.
PREFERRED SKILLS AND EXPERIENCE:
- Direct background in AI or hyperscale data center operations, including liquid cooling systems, high-power GPU/accelerator environments, and on-site power generation.
- Experience building or scaling fiber infrastructure for low-latency, high-bandwidth interconnects between compute clusters or sites.
- Familiarity with Uptime Institute Tier standards, ASHRAE guidelines, power generation standards (e.g., IEEE, NFPA), OSHA/EPA compliance, and sustainability practices in critical facilities.
- Bachelor’s or Master’s degree in Electrical, Mechanical Engineering, Power Systems, Facilities Management, or related field; relevant certifications (CDCP, CDCS, or equivalent) a plus.
- Track record of implementing automation, predictive analytics, or process improvements that significantly enhanced operational performance and power reliability.
ADDITIONAL REQUIREMENTS:
- Willingness to be primarily onsite at key facilities (e.g., Memphis region) with on-call responsibilities and travel to other sites as needed.
- Ability to work in industrial/data center environments and lead teams during high-pressure phases.