EngineeringSan Francisco/RemoteFull time

AI Compute Hardware and Platform Lead

Own the selection, qualification, and lifecycle of the GPU platforms powering Sesterce's AI factories.

Role

You will own the selection, qualification, and lifecycle management of Sesterce's GPU compute platforms — turning cutting-edge accelerator systems into dependable production infrastructure at scale.

What you will do

Own server qualification for AI compute platforms including GPU and alternate accelerator types; lead procurement strategy, OEM relationships, and platform acceptance standards
Lead BOM validation, platform bring-up, firmware lifecycle management, repair strategy, sparing models, burn-in and acceptance testing, and end-of-life planning
Establish fleet strategy across PCIe, CXL, NVLink, HBM systems, power delivery, BMC/Redfish, cooling interfaces, and optical NIC integration
Build the operating model for FRU inventories, RMA execution, depot workflows, and field replacement quality; create service strategies for board swap, firmware rollback, and spare-part pooling
Influence vendor roadmaps through strong technical engagement; build lifecycle decision frameworks for expansion, sustainment, refresh, and retirement across heterogeneous hardware generations

What we are looking for

Extensive experience building and operating large accelerator fleets in hyperscale, cloud, HPC, or advanced systems environments
Deep knowledge of server architecture, board-level integration, firmware risk, and fleet reliability engineering
Track record with NVIDIA HGX / DGX, TPU infrastructure, custom accelerator programs, or dense rack-scale compute systems
Strong understanding of manufacturing quality, service logistics, and infrastructure deployment at scale
Ability to bridge lab-based engineering rigor with global production operations; able to mentor senior engineers on hardware systems thinking and practical fleet management