Role
You will own the selection, qualification, and lifecycle management of Sesterce's GPU compute platforms — turning cutting-edge accelerator systems into dependable production infrastructure at scale.
What you will do
- Own server qualification for AI compute platforms including GPU and alternate accelerator types; lead procurement strategy, OEM relationships, and platform acceptance standards
- Lead BOM validation, platform bring-up, firmware lifecycle management, repair strategy, sparing models, burn-in and acceptance testing, and end-of-life planning
- Establish fleet strategy across PCIe, CXL, NVLink, HBM systems, power delivery, BMC/Redfish, cooling interfaces, and optical NIC integration
- Build the operating model for FRU inventories, RMA execution, depot workflows, and field replacement quality; create service strategies for board swap, firmware rollback, and spare-part pooling
- Influence vendor roadmaps through strong technical engagement; build lifecycle decision frameworks for expansion, sustainment, refresh, and retirement across heterogeneous hardware generations
What we are looking for
- Extensive experience building and operating large accelerator fleets in hyperscale, cloud, HPC, or advanced systems environments
- Deep knowledge of server architecture, board-level integration, firmware risk, and fleet reliability engineering
- Track record with NVIDIA HGX / DGX, TPU infrastructure, custom accelerator programs, or dense rack-scale compute systems
- Strong understanding of manufacturing quality, service logistics, and infrastructure deployment at scale
- Ability to bridge lab-based engineering rigor with global production operations; able to mentor senior engineers on hardware systems thinking and practical fleet management