GLM-5

GLM-5 is a MOE developed by Zhipu AI and Tsinghua University. It has 744B parameters with 256 experts and 1 shared and 8 routed experts active per token (or 40B active parameters per token). They trained on 28.5T tokens.

Architecture

From the GLM-5 technical report:¹

Model	GLM-4.5	GLM-5
# Total Parameters	355B	744B
# Activated Parameters	32B	40B
# Dense Layers	3	3
# MoE Layers	89	75
# MTP Layers	1	1
Hidden Dim	5120	6144
Dense Intermediate Dim	12288	12288
MoE Intermediate Dim	1536	2048
QK Head Dim	128	192
V Head Dim	128	256
Q LoRA Dim	–	2048
KV LoRA Dim	–	512
# Attention Heads	96	64
# Key-Value Heads	8	–
# Indexer Attn Heads	–	32
# Indexer Head Dim	–	128
# Experts (total)	160	256
# Routed Experts	8	8
# Shared Experts	1	1
Vocabulary Size	151552	154880

Data pipeline

They trained on 38.5 trillion tokens.

GLM-5: from Vibe Coding to Agentic Engineering ↩

Glenn's Digital Garden

Explorer

GLM-5

Architecture

Data pipeline

Graph View

Table of Contents

Backlinks

Glenn's Digital Garden

Explorer

GLM-5

Architecture

Data pipeline

Footnotes

Graph View

Table of Contents

Backlinks