Google TPU 8t Turns Months of AI Model Training Into Weeks

When we talk about AI chips, Nvidia is usually the first name that comes to mind. However, Google has quietly built one of the most capable AI silicon lineups in the industry through its homegrown Tensor Processing Unit program. At Cloud Next 2026, the company introduced its eighth-generation TPUs, and this time it took a different approach entirely.

Rather than releasing a single flagship chip, Google split the generation into two purpose-built designs: the TPU 8t for training and the TPU 8i for inference. The company developed both chips in partnership with Google DeepMind.

“Our eighth-generation TPUs are the culmination of more than a decade of development,” said Amin Vahdat, Google’s SVP and chief technologist for AI and infrastructure.

The training chip: TPU 8t

Google designed the TPU 8t as a workhorse for large-scale model training, with the stated goal of cutting frontier model development cycles from months to weeks.

A single TPU 8t superpod scales to 9,600 chips and offers two petabytes of shared high-bandwidth memory (HBM), with double the interchip bandwidth of the previous generation, Ironwood. The architecture delivers 121 exaflops of FP4 compute performance, with per-pod compute performance nearly tripling compared to Ironwood.

To support that scale, Google built a new networking architecture called Virgo Network. It delivers a 4x increase in data center bandwidth using high-radix switches that reduce network layers. Combined with JAX and Google’s Pathways software, Virgo enables near-linear scaling to more than 1 million TPU chips in a single logical training cluster. A single Virgo fabric can link over 134,000 TPU 8t chips with up to 47 petabits per second of non-blocking bi-sectional bandwidth, delivering more than 1.6 million exaflops overall.

The inference chip: TPU 8i

The TPU 8i targets the other side of the AI pipeline, serving models fast and at scale for concurrent users and agents.

A single TPU 8i pod scales to 1,152 chips with 331.8TB of total HBM capacity and 11.6 exaflops of FP8 compute performance. According to Alphabet CEO Sundar Pichai, the chip can “deliver the massive throughput and low latency needed to concurrently run millions of agents cost-effectively.”

Google said it redesigned the full stack around the TPU 8i to eliminate what it calls the “waiting room” effect, where user requests queue up while hardware sits underutilized. The company addressed this through four specific changes.

First, the chip pairs 288GB of HBM with 384MB of on-chip SRAM, three times more than Ironwood, keeping a model’s active working set entirely on-chip and preventing processors from sitting idle. Second, Google doubled the physical CPU hosts per server by moving to its custom Axion Arm-based processors.

Google TPU 8t and 8i: 2x Better Performance Per Watt Than Ironwood

Both the TPU 8t and 8i run on Google’s Axion Arm-based CPU host and support liquid cooling. Google said it optimized power management across the entire stack, with integrated systems that dynamically adjust power draw based on real-time demand.

The company claims both chips deliver up to 2x better performance-per-watt compared to Ironwood.

“By owning the full stack, from Axion host to accelerator, we can optimize system-level energy efficiency in ways that simply cannot be achieved when the host and chip are designed independently,” Vahdat said.

Availability

Both chips will reach general availability later in 2026. Google will offer them through its AI Hypercomputer, the cloud-based supercomputer architecture it launched in 2023 that combines performance-optimized hardware, open software, ML frameworks, and flexible consumption models.

Google TPU 8t Turns Months of AI Model Training Into Weeks

The training chip: TPU 8t

The inference chip: TPU 8i

Google TPU 8t and 8i: 2x Better Performance Per Watt Than Ironwood

Availability

Comments

Leave a Reply Cancel reply