Mamba

Mamba is an alternative to attention where, instead of re-computing all pairwise interactions like attention, it keeps a hidden state that is updated every time new tokens arrive. This hidden state is designed to carry long-range dependencies (like attention) but scales with sequence length $S$ as $O (S)$ rather than $O (S^{2})$ (like attention).

Mamba-2 is an updated form of Mamba.

Glenn's Digital Garden

Explorer

Mamba

Graph View

Backlinks