Inference

After training, the model generates new names it has never seen. Starting from BOS, it predicts the next character, samples one, feeds it back in, and repeats until BOS appears again.

Generate

Model not trained. Go to Training and train for 200+ steps first. You can generate now, but results will be random.

Temperature:0.5

Conservative

Temperature controls randomness. Low (<0.5) = picks most likely characters (safe). High (>1.0) = more random picks (creative).

Comparison Mode(compare 3 temperatures)

Temperature 0.5: Conservative

At this conservative temperature, the model strongly prefers likely characters but occasionally takes a less obvious choice. Names will look realistic and common.

How Temperature Works

1. Model outputs logits: Raw scores for each character (e.g., a: 2.5, b: -0.3, c: 1.8)

2. Divide by temperature:

adjusted = logit / temperature

Low temp (0.3) = bigger differences = sharper peak
High temp (2.0) = smaller differences = flatter spread

3. Convert to probabilities (softmax):

prob = exp(adjusted) / sum(all exp values)

Effect on Distribution

Same raw scores, different temperatures:

Low temp (blue) = sharp peak. High temp (orange) = even spread.

No names generated yet

Click "Generate 1" or "Generate 10" above to create names. You can then explore how each was built character-by-character.

Temperature guide:

0.1 - 0.5 = Safe, predictable names

0.5 - 1.0 = Balanced variety

1.0 - 2.0 = Creative, unusual names

How inference works

• The model starts with the BOS token as input
• At each step, it runs a full forward pass to get probabilities for all 27 tokens
• Temperature scales the logits before softmax — dividing by T makes the distribution sharper (low T) or flatter (high T)
• A character is sampled from this distribution (not just argmax, which would be greedy)
• The sampled character becomes the input for the next step
• Generation stops when BOS is predicted — the model decides the name is complete
• The same model can generate infinitely many different names due to the randomness in sampling