What is Top-p (Nucleus Sampling)?

Fundamentals

Top-p (Nucleus Sampling)

A sampling method that limits token selection to the smallest set of tokens whose cumulative probability exceeds a threshold p. At top-p=0.9, the model considers only the top tokens that together account for 90% of the probability mass.

Top-p sampling, also called nucleus sampling, is an alternative to temperature for controlling the randomness of model outputs. Instead of scaling all probabilities uniformly (as temperature does), top-p dynamically selects a subset of tokens at each step. The model ranks all possible next tokens by probability, then includes tokens from most to least likely until their cumulative probability reaches the threshold p.

For example, with top-p set to 0.9, if the top 5 tokens collectively have a 90% probability of being the next token, the model randomly samples from only those 5 tokens (with probabilities renormalized). The remaining thousands of low-probability tokens are excluded entirely. This prevents the model from occasionally selecting highly unlikely tokens that could derail the output.

The advantage of top-p over temperature is adaptability. When the model is very confident about the next token (one token has 95% probability), top-p=0.9 effectively reduces the choice to that single token. When the model is uncertain (many tokens share similar probabilities), top-p allows sampling from a broader set. Temperature, by contrast, applies the same scaling regardless of the underlying distribution shape.

Common top-p values range from 0.8 to 1.0. A value of 1.0 disables nucleus sampling (all tokens are considered), while lower values like 0.5 make the output more focused but potentially repetitive. Most API providers default to top-p=1.0 and recommend adjusting temperature as the primary control, but top-p can be especially useful when you want to eliminate tail-end randomness without over-constraining the output.

Tokens

Transformer

Explore more AI concepts in the glossary

Browse Full Glossary