GLM-5 is a MOE developed by Zipu AI and Tsinghua University. It has 744B parameters with 256 experts and 1 shared and 8 routed experts active per token (or 40B active parameters per token). They trained on 28.5T tokens.

Architecture

From the GLM-5 technical report:1

ModelGLM-4.5GLM-5
# Total Parameters355B744B
# Activated Parameters32B40B
# Dense Layers33
# MoE Layers8975
# MTP Layers11
Hidden Dim51206144
Dense Intermediate Dim1228812288
MoE Intermediate Dim15362048
QK Head Dim128192
V Head Dim128256
Q LoRA Dim2048
KV LoRA Dim512
# Attention Heads9664
# Key-Value Heads8
# Indexer Attn Heads32
# Indexer Head Dim128
# Experts (total)160256
# Routed Experts88
# Shared Experts11
Vocabulary Size151552154880

Data pipeline

They trained on 38.5 trillion tokens.

Footnotes

  1. GLM-5: from Vibe Coding to Agentic Engineering