Google Cloud on Wednesday said its eighth generation of custom-built AI chips, known as Tensor Processing Units, will be divided into two variants.
The TPU 8t will be designed for model training, while the TPU 8i will focus on inference tasks.
Inference refers to the ongoing use of AI models after they have been trained—essentially, the process that occurs when users submit prompts and receive responses.
The company also highlights significant performance improvements over previous generations of its TPUs, including up to 3x faster AI model training, 80 per cent better performance per dollar, and the capacity to scale beyond one million TPUs working together in a single cluster.
The overall effect, it says, is far greater computing power at lower energy consumption and reduced cost for customers compared to earlier versions.
These chips are called TPUs, short for Tensor Processing Units, because they are Google’s custom low-power processors originally built for tensor-based machine learning workloads, rather than general-purpose GPUs.
Google is using its own chips to complement the Nvidia-based systems available on its infrastructure, rather than replacing them outright. In fact, the company says it will also offer Nvidia’s latest Vera Rubin chips on its cloud platform later this year.
In the longer term, hyperscalers developing in-house AI chips—including Amazon, Microsoft, and Google—could gradually reduce their reliance on Nvidia as more enterprises shift AI workloads to cloud platforms and adapt their applications to run on these custom processors.

