Back to Home
03

Forward Pass

Watch one token flow through the entire model. Each step transforms the 16-number vector until it becomes 27 probability scores — one for each possible next character.

Select Input

Token:
Position:

Data flowing through the model

Each box shows the data at that stage. Click a step to see its explanation. The 16 numbers get transformed at each step. Colors show values: negative to positive.

Intermediate Vectors (16 dims)

Token embedding: wte['g']
-0.224
0.526
0.791
0.330
-0.435
-0.800
-0.429
0.336
0.792
0.520
-0.230
-0.769
-0.601
0.120
0.730
0.669
Position embedding: wpe[0]
-0.405
0.362
0.796
0.498
-0.257
-0.776
-0.582
0.148
0.741
0.653
-0.035
-0.692
-0.712
-0.078
0.628
0.756
Combined (tok + pos)
-0.629
0.887
1.587
0.828
-0.693
-1.576
-1.011
0.484
1.534
1.174
-0.266
-1.461
-1.313
0.042
1.358
1.426
After RMSNorm
-0.672
0.949
1.698
0.886
-0.741
-1.686
-1.081
0.518
1.640
1.255
-0.284
-1.562
-1.404
0.045
1.453
1.525
After Attention + Residual
-1.385
1.901
3.482
1.995
-1.136
-3.023
-1.968
0.979
3.011
2.165
-0.850
-3.286
-2.878
0.067
2.937
3.191
After MLP + Residual
-0.985
2.283
3.812
2.244
-0.991
-2.994
-2.059
0.777
2.716
1.803
-1.246
-3.681
-3.236
-0.223
2.741
3.107

Output: Next Token Probabilities

The model's prediction for what character comes after 'g' at position 0. Higher bars = more likely.
g
16.0%
u
7.0%
o
6.8%
k
6.7%
q
6.4%
h
5.8%
w
5.5%
f
5.5%
l
4.9%
y
4.1%

Key Concepts

Attention

Allows the model to look at previous positions and decide which ones are relevant. Each of 4 heads can focus on different patterns. Q="what am I looking for?", K="what do I contain?", V="what do I offer?"

Residual Connections

Add the output back to the input at certain points. This helps gradients flow during training and preserves original information alongside new insights.

RMSNorm

Normalizes values so they're on a similar scale. Prevents any dimension from dominating and makes training more stable.

MLP

A two-layer network: expand to 4x size (1664), apply ReLU activation, then compress back (6416). This refines and transforms the representation.