EventsApr 09, 2026

No Gradient Descent, Only Ascent by Sesterce, PyTorch, SemiAnalysis, GPU Mode: Inside the Paris Hackathon Pushing AI Systems Higher

Inside the Paris hackathon bringing Sesterce, SemiAnalysis, GPU Mode, PyTorch, Verda and Prime Intellect together to push AI systems performance higher.

No Gradient Descent, Only Ascent Paris hackathon

On April 9, 2026, Sesterce joined forces with SemiAnalysis, GPU Mode, PyTorch, Verda and Prime Intellect for No Gradient Descent, Only Ascent, an advanced ML systems hackathon in Paris, France. Hosted around the PyTorch Conference Europe ecosystem, the event brought together researchers, engineers and builders working at the frontier of AI performance, large-scale training and inference optimization.

The event took place at Neon Noir Paris, with public posts from Sesterce and Neon Noir referencing the gathering as a Paris side event with GPU Mode, PyTorch, SemiAnalysis, Verda, Prime Intellect and Sesterce (Sesterce LinkedIn, Neon Noir LinkedIn). The goal was simple and ambitious: bring top technical talent into one room and push AI workloads closer to the limits of modern accelerator infrastructure.

Where AI performance becomes real

AI progress is often discussed through the lens of models, datasets and applications. But at scale, performance is also decided much deeper in the stack: kernels, compilers, memory movement, GPU communication, precision formats and distributed systems.

That was the focus of No Gradient Descent, Only Ascent. The hackathon centered on two tracks: distributed training and inference optimization, giving teams the opportunity to work directly on the systems layer that determines how efficiently AI infrastructure is used.

For Sesterce, this reflects a core conviction: the future of AI infrastructure is not only about securing compute capacity. It is about making every watt, every GPU, every interconnect and every kernel count.

Two tracks, one objective: more efficient AI

The first track focused on pre-training an LLM from scratch in a limited time on a B300 cluster, with teams working to make optimal use of 360 PFLOP/s of BF16 compute. The second track, powered by Sesterce H200 GPUs, challenged teams to compete on a leaderboard for the fastest inference on a fixed model, with each team receiving access to 4x H200s for the duration of the hackathon.

This structure made the event more than a coding competition. It was a direct exploration of the core bottlenecks shaping AI infrastructure today: training efficiency, inference latency, GPU utilization and the ability to extract more performance from high-end hardware.

A room built for technical depth

The day opened with technical talks from leading voices across the ML systems ecosystem. The agenda included Will Feng from PyTorch’s Helion team on high-level DSLs for kernel authoring, Tyler Michael Smith from vLLM on large-scale inference, Matej Sirovatka from Prime Intellect on reinforcement learning at scale, and Erik Schultheis from ISTA on low-precision formats.

These sessions framed the hackathon around the deepest layers of AI performance. Participants were not just discussing AI systems in theory. They were building, testing, benchmarking and optimizing directly against modern accelerator environments.

Power is the new compute

One of the central moments of the event was the fireside chat Power is the new compute, featuring Youssef El Manssouri from Sesterce and Jeremie Eliahou Ontiveros from SemiAnalysis. The title captured one of the defining realities of the AI infrastructure era: compute is no longer constrained only by chip supply, but also by power availability, energy strategy, deployment speed and the ability to operate infrastructure at industrial scale.

This is where Sesterce’s role becomes strategic. As AI workloads scale, the winners will not only be the teams with access to GPUs. They will be the teams able to combine energy, hardware, systems optimization and operational execution into a single infrastructure platform.

Why kernels matter

Kernels are where abstract AI operations become physical execution. They determine how computation is mapped onto GPU hardware, how memory is accessed, how parallelism is used and how much of the available performance is actually captured.

Better kernels can reduce inference costs, increase throughput, improve training efficiency and make existing GPU infrastructure more productive. In a world where AI compute is scarce, expensive and energy-intensive, these gains are not marginal. They are strategic.

That is why No Gradient Descent, Only Ascent matters. It put the people working closest to the hardware at the center of the AI conversation.

Paris as an AI infrastructure hub

The event also highlighted the growing role of Paris and Europe in the AI infrastructure landscape. SemiAnalysis lists the GPU Mode IRL Hackathon as an April 9 event in Paris designed to wrap up the PyTorch Conference with a GPU Mode x PyTorch IRL hackathon. PyTorch described the event as a Paris side event following PyTorch Conference EU, focused on LLM speedruns and systems performance.

For Europe, this is an important signal. AI sovereignty will not be built only through models or regulation. It will require compute infrastructure, technical communities, energy-backed deployment capacity and a deep bench of engineers able to optimize the systems that power frontier AI.

Only ascent

The name said everything: No Gradient Descent, Only Ascent. It was a celebration of the builders who work where AI performance is truly won: close to the hardware, close to the compiler, close to the kernel and close to the infrastructure.

For Sesterce, the event was part of a broader mission to support the next generation of AI infrastructure builders. From AI Factories to H200-powered inference optimization, the future of AI will depend on the teams capable of combining scale, efficiency and deep technical execution.

In Paris, for one day, that future became visible. Top researchers and engineers came together not to talk about abstract compute, but to make it faster, sharper and more efficient.

No gradient descent. Only ascent.