Latest

Gigawatt-Scale AI Training Loads: Grid Stabilit...

June 27, 2025

The largest AI labs are racing to build multi-gigawatt datacenters, stressing a century-old electric grid in ways it was never designed to handle. Beyond sheer scale, large training clusters exhibit...

Gigawatt-Scale AI Training Loads: Grid Stabilit...

June 27, 2025

The largest AI labs are racing to build multi-gigawatt datacenters, stressing a century-old electric grid in ways it was never designed to handle. Beyond sheer scale, large training clusters exhibit...

NVIDIA Tensor Core Evolution From Volta To Blac...

June 25, 2025

Overview AI model capability growth and reductions in unit token cost have been propelled by compounding scaling laws across training, inference, and compute. Even as classic Dennard scaling and transistor-cost...

NVIDIA Tensor Core Evolution From Volta To Blac...

June 25, 2025

Overview AI model capability growth and reductions in unit token cost have been propelled by compounding scaling laws across training, inference, and compute. Even as classic Dennard scaling and transistor-cost...

AMD MI350X And MI400 UALoE72: Architecture And ...

June 15, 2025

Executive Summary Positioning: MI355X can be competitive with HGX B200 for small–to–medium LLM inference on a performance-per-TCO basis, but is not competitive with GB200 NVL72 for frontier inference or training....

AMD MI350X And MI400 UALoE72: Architecture And ...

June 15, 2025

Executive Summary Positioning: MI355X can be competitive with HGX B200 for small–to–medium LLM inference on a performance-per-TCO basis, but is not competitive with GB200 NVL72 for frontier inference or training....

UALink Versus Broadcom Scale-Up Ethernet for AI...

June 13, 2025

Ethernet vs. InfiniBand in the GenAI Era Standard Ethernet ceded share to InfiniBand early in the GenAI boom. Since then, Ethernet has regained ground on cost, operational gaps in InfiniBand,...

UALink Versus Broadcom Scale-Up Ethernet for AI...

June 13, 2025

Ethernet vs. InfiniBand in the GenAI Era Standard Ethernet ceded share to InfiniBand early in the GenAI boom. Since then, Ethernet has regained ground on cost, operational gaps in InfiniBand,...

Reward Hacking Risks And Robust Reward Design I...

June 10, 2025

Overview Test-time scaling is thriving: recent reasoning models deliver higher scores on real-world tasks (e.g., SWE-Bench) at lower cost. Beyond chain-of-thought (CoT) prompting, models now maintain coherence over longer horizons,...

Reward Hacking Risks And Robust Reward Design I...

June 10, 2025

Overview Test-time scaling is thriving: recent reasoning models deliver higher scores on real-world tasks (e.g., SWE-Bench) at lower cost. Beyond chain-of-thought (CoT) prompting, models now maintain coherence over longer horizons,...

ROCm CI Coverage Gaps and Their Impact on Infer...

May 25, 2025

Executive Summary In self-owned clusters, perf/$ depends on workload and latency: NVIDIA leads in some scenarios; AMD leads in others. For short-to-medium rentals (<6 months), NVIDIA consistently wins perf/$ due...

ROCm CI Coverage Gaps and Their Impact on Infer...

May 25, 2025

Executive Summary In self-owned clusters, perf/$ depends on workload and latency: NVIDIA leads in some scenarios; AMD leads in others. For short-to-medium rentals (<6 months), NVIDIA consistently wins perf/$ due...