See Google’s TPUv4 documentation.1 In brief, each one has:

  • 2 TensorCores (not to be confused with NVIDIA’s terminology)
    • 8 MXUs (2 per TensorCores)
    • 2 vector units (1 per TensorCore)
    • 2 scalar units (1 per TensorCore)
    • No sparsity
  • 32 GB HBM2 (? stacks)
    • 1.2 TB/s (max)
  • x16 PCIe Gen3
  • 192 W maximum

These TPUv4 chips are assembled into SuperPods of 4,096 chips which share an optical switch that can dynamically reconfigure 64-processor cubes of processors into different torus topologies in ten seconds.2 Each 64-processor cube has 2 TiB of HBM2, and each SuperPod has 128 TiB of HBM2.

Performance

The following are theoretical maximum performance in TFLOPS:1

Data TypeVFMAMatrixSparse
FP64
FP32
TF32
FP16
BF16275
FP8
INT32
INT8275

Footnotes

  1. TPU v4 (cloud.google.com) 2

  2. Gemini: A Family of Highly Capable Multimodal Models