Maia (Microsoft Artificial Intelligence Accelerator, but stylized Maia) is Microsoft’s first-generation AI accelerator. Key features:1
- One Maia 100 = 16 clusters = 64 tiles
- Tensor unit implemented as 16xRx16 (not sure what this means; does this notation connect with that used to describe tensor core capabilities?)
- L1 and L2 scratchpads
- Supports low precision MX formats (4-bit, 6-bit, 9-bit, FP32, BF16)2
- 64 GB HBM2e
- 4x HBM2e stacks
- 1.8 TB/s bandwidth
- 12x 400 GbE ports per chip
- 3x400G to three other Maia chips per node (9x400 total)
- 3x400G to T0 switch layer (3x400G total)
- Each Maia 100 connects to three different T0 switches (multi-plane with three planes)
- Ethernet used for both intra- and inter-node interconnect
- 4800 Gbps AllGather and Scatter-Reduce bandwidth
- 1200 Gbps Alltoall bandwidth (what does this mean?)
- Uses a “gather-based approach” for distributed GEMM instead of AllReduce-based approach. Not sure what this means, but it was explained at HotChips.2
- 105 billion transistors on a 820 mm^2 reticle-limited die; TSMC N53
- 500W, capable up to 700W