Fundamentals

Tokens

The basic units that language models use to read and generate text. A token can be a word, part of a word, or a punctuation mark — roughly 3/4 of a word in English on average.

Tokens are the fundamental currency of language models. Before processing any text, the model's tokenizer breaks the input into tokens — subword units derived from patterns in the training data. The word "understanding" might become two tokens ("under" + "standing"), while common words like "the" are typically a single token. Numbers, code, and non-English text often require more tokens per character.

Tokenization matters for three practical reasons. First, pricing: API providers like OpenAI, Anthropic, and Google charge per token for both input and output. Understanding tokenization helps you estimate costs accurately. Second, context limits: the context window is measured in tokens, not words or characters, so token-dense content (like code or JSON) uses up your context budget faster. Third, generation speed: models produce one token at a time during inference, so longer outputs take longer to generate.

Different model families use different tokenizers. OpenAI uses tiktoken (with BPE encoding), Anthropic and Meta use SentencePiece variants, and each tokenizer will split the same text differently. A sentence that is 15 tokens in GPT-4 might be 18 tokens in Llama 3. Most providers offer tokenizer tools or libraries so you can count tokens before sending a request.

As a rough rule of thumb, 1 token is approximately 4 characters or 0.75 words in English. A typical page of text is about 500-700 tokens. Knowing this helps you plan prompts, estimate costs, and stay within context limits.

Throughput

Top-p (Nucleus Sampling)

Explore more AI concepts in the glossary

Browse Full Glossary