LayerNorm

LayerNorm is an operation that shifts and rescales a vector so that its mean is zero and its variance is 1.0, then re-scales this result by two learned parameters.

This scales the magnitudes of one vector (activations) to be scaled to match another vector before they are mixed.

When applied to a vector of length $d$ , LayerNorm contributes $2 d$ learned parameters to a model everywhere it is used.

Mathematically

Given an input vector $x$ of dimension $d$ , LayerNorm is:

$LayerNorm (x) = γ ⊙ \frac{x - μ}{σ + ϵ} + β$ where

$μ = \frac{1}{d} \sum_{i = 1}^{d} x_{i}$ is the mean of the input vector

$σ = \frac{1}{d} \sum_{i = 1}^{d} (x_{i} - μ)^{2}$ is the standard deviation of the input vector

$γ, β \in R^{d}$ is the learned per-dimension scale and bias

$ϵ = 1 0^{- 5}$ is a numerical stability constant

I don’t really know what this stuff means.

Glenn's Digital Garden

Explorer

LayerNorm

Mathematically

Graph View

Backlinks