Google splits its next TPU in two, and the AI chip war just became a design philosophy fight

Google Splits Its Next TPU into Training and Inference Chips

Google launched Ironwood, its seventh-generation Tensor Processing Unit (TPU), at Cloud Next 2026, positioning it as the first TPU designed for the "age of inference." Alongside previewing TPU 8t (Sunfish)—a training chip designed by Broadcom targeting TSMC's 2nm process node in late 2027—and TPU 8i (Zebrafish), an inference chip designed by MediaTek, also for 2nm, scheduled for the same time.

Ironwood offers 4.6 petaFLOPS per chip and, when linked into a superpod of 9,216 chips, achieves 42.5 exaFLOPS—a significant leap from its predecessor Trillium. Google emphasizes that this architecture is the first to purpose-build separate training and inference chips, with Anthropic securing 3.5 gigawatts of compute in 2027, becoming the anchor customer for both generations.

Key Takeaways:

  • Direct Competitor: Ironwood competes directly with Nvidia’s Blackwell B200 on raw specifications, though Nvidia has an edge in single-device interconnect bandwidth and supports FP4 precision.

  • Cluster Scale Advantage: Google's superpod architecture, energy efficiency, and the cost advantages of custom silicon designed for inference give them a competitive edge at scale.

  • Strategic Shift: The focus on inference reflects Google’s recognition that operational costs for deploying AI models far exceed initial training expenses. Efficient inference hardware is crucial to maintain profitability in the long term.

Why Inference Matters Now:

Training large language models (LLMs) is a significant upfront investment, while inference—running these models in response to user requests—is an ongoing cost. As demand for AI-powered applications grows, so does the need for efficient and cost-effective inference solutions. Ironwood is Google's answer.