Latest

Gigawatt-Scale AI Training Loads: Grid Stabilit...
The largest AI labs are racing to build multi-gigawatt datacenters, stressing a century-old electric grid in ways it was never designed to handle. Beyond sheer scale, large training clusters exhibit...
Gigawatt-Scale AI Training Loads: Grid Stabilit...
The largest AI labs are racing to build multi-gigawatt datacenters, stressing a century-old electric grid in ways it was never designed to handle. Beyond sheer scale, large training clusters exhibit...

NVIDIA Tensor Core Evolution From Volta To Blac...
Overview AI model capability growth and reductions in unit token cost have been propelled by compounding scaling laws across training, inference, and compute. Even as classic Dennard scaling and transistor-cost...
NVIDIA Tensor Core Evolution From Volta To Blac...
Overview AI model capability growth and reductions in unit token cost have been propelled by compounding scaling laws across training, inference, and compute. Even as classic Dennard scaling and transistor-cost...

AMD MI350X And MI400 UALoE72: Architecture And ...
Executive Summary Positioning: MI355X can be competitive with HGX B200 for small–to–medium LLM inference on a performance-per-TCO basis, but is not competitive with GB200 NVL72 for frontier inference or training....
AMD MI350X And MI400 UALoE72: Architecture And ...
Executive Summary Positioning: MI355X can be competitive with HGX B200 for small–to–medium LLM inference on a performance-per-TCO basis, but is not competitive with GB200 NVL72 for frontier inference or training....

UALink Versus Broadcom Scale-Up Ethernet for AI...
Ethernet vs. InfiniBand in the GenAI Era Standard Ethernet ceded share to InfiniBand early in the GenAI boom. Since then, Ethernet has regained ground on cost, operational gaps in InfiniBand,...
UALink Versus Broadcom Scale-Up Ethernet for AI...
Ethernet vs. InfiniBand in the GenAI Era Standard Ethernet ceded share to InfiniBand early in the GenAI boom. Since then, Ethernet has regained ground on cost, operational gaps in InfiniBand,...

Reward Hacking Risks And Robust Reward Design I...
Overview Test-time scaling is thriving: recent reasoning models deliver higher scores on real-world tasks (e.g., SWE-Bench) at lower cost. Beyond chain-of-thought (CoT) prompting, models now maintain coherence over longer horizons,...
Reward Hacking Risks And Robust Reward Design I...
Overview Test-time scaling is thriving: recent reasoning models deliver higher scores on real-world tasks (e.g., SWE-Bench) at lower cost. Beyond chain-of-thought (CoT) prompting, models now maintain coherence over longer horizons,...

ROCm CI Coverage Gaps and Their Impact on Infer...
Executive Summary In self-owned clusters, perf/$ depends on workload and latency: NVIDIA leads in some scenarios; AMD leads in others. For short-to-medium rentals (<6 months), NVIDIA consistently wins perf/$ due...
ROCm CI Coverage Gaps and Their Impact on Infer...
Executive Summary In self-owned clusters, perf/$ depends on workload and latency: NVIDIA leads in some scenarios; AMD leads in others. For short-to-medium rentals (<6 months), NVIDIA consistently wins perf/$ due...