AI Glossary — Key Terms & Concepts Explained | GPTCrunch

All Terms

Browse all 31 terms alphabetically

API Pricing

The cost structure for using AI models through cloud APIs, typically charged per token processed. Input tokens and output tokens usually have different prices, with output tokens costing more.

Read full explanation

Attention Mechanism

Architecture

The core component of transformer models that allows each token to dynamically focus on relevant parts of the input sequence. It computes weighted relationships between all tokens, enabling the model to understand context and dependencies.

Read full explanation

Benchmarks

Standardized tests used to measure and compare AI model performance across specific tasks like reasoning, coding, math, and language understanding.

Read full explanation

Chain-of-Thought (CoT)

Fundamentals

A prompting technique that encourages the model to show its reasoning step by step before arriving at a final answer, significantly improving performance on complex reasoning, math, and logic tasks.

Read full explanation

Context Window

Fundamentals

The maximum amount of text (measured in tokens) that a language model can process in a single request, including both the input prompt and the generated output.

Read full explanation

Distillation

Training

A training technique where a smaller "student" model learns to replicate the behavior of a larger "teacher" model, producing a compact model that retains much of the teacher's capability at lower cost.

Read full explanation

Embeddings

Architecture

Dense numerical vector representations of text (or other data) that capture semantic meaning. Texts with similar meanings have similar embeddings, enabling search, clustering, and retrieval applications.

Read full explanation

Few-Shot Learning

Training

A prompting technique where you provide a small number of input-output examples within the prompt to guide the model's behavior on a new task, without any additional training.

Read full explanation

Fine-Tuning

Training

The process of further training a pre-trained model on a smaller, task-specific dataset to improve its performance on particular tasks or to adapt it to a specific domain.

Read full explanation

Frontier Models

Benchmarks

The most capable and advanced AI models available at any given time, typically produced by leading AI labs. They represent the current state of the art in performance across reasoning, coding, and general intelligence tasks.

Read full explanation

Function Calling

Fundamentals

A capability that allows AI models to generate structured outputs that invoke predefined functions or APIs, enabling models to take actions like searching databases, calling web services, or executing code.

Read full explanation

Guardrails

Deployment

Safety mechanisms and content filters implemented to prevent AI models from generating harmful, biased, or inappropriate outputs. These include both built-in model alignment and external validation systems.

Read full explanation

Hallucination

Fundamentals

When an AI model generates information that sounds plausible and confident but is factually incorrect, fabricated, or not grounded in the provided context. A major reliability challenge in AI applications.

Read full explanation

Inference

Deployment

The process of running a trained AI model to generate predictions or outputs from new inputs. This is what happens every time you send a prompt to an AI model and receive a response.

Read full explanation

Latency

Deployment

The time delay between sending a request to an AI model and receiving the response. Low latency is critical for real-time applications like chatbots and coding assistants.

Read full explanation

Mixture of Experts (MoE)

Architecture

A model architecture that uses multiple specialized sub-networks ("experts") and a routing mechanism that activates only a subset of them for each input, enabling larger total model capacity without proportional increases in computation.

Read full explanation

Multimodal

Architecture

AI models that can process and generate multiple types of data — such as text, images, audio, and video — rather than being limited to text alone.

Read full explanation

Open-Source Models

Deployment

AI models whose weights are publicly released, allowing anyone to download, run, modify, and fine-tune them. Examples include Meta's Llama, Mistral, and DeepSeek models.

Read full explanation

Parameters

Architecture

The learned numerical values (weights) within a neural network that determine the model's behavior. More parameters generally means more capacity to store knowledge and handle complex tasks, but also higher computational costs.

Read full explanation

Prompt Engineering

Fundamentals

The practice of designing and refining the text instructions (prompts) given to an AI model to elicit the most accurate, useful, and relevant responses for a given task.

Read full explanation

Quantization

Deployment

A technique that reduces the precision of a model's numerical weights (e.g., from 16-bit to 4-bit), dramatically decreasing memory usage and increasing inference speed with only a small loss in output quality.

Read full explanation

RAG (Retrieval-Augmented Generation)

Architecture

A technique that enhances AI model responses by retrieving relevant information from external knowledge sources and including it in the prompt, reducing hallucinations and enabling access to up-to-date or private data.

Read full explanation

Reasoning Models

Benchmarks

AI models specifically designed or trained to perform extended logical reasoning, mathematical problem-solving, and multi-step analysis. They typically use internal chain-of-thought processing before producing a final answer.

Read full explanation

RLHF (Reinforcement Learning from Human Feedback)

Training

A training technique that uses human preferences to fine-tune AI models, making them more helpful, harmless, and honest. Humans rank model outputs, and the model learns to produce responses that align with those preferences.

Read full explanation

System Prompt

Fundamentals

A special instruction set provided to the model at the beginning of a conversation that defines its behavior, personality, constraints, and role. It persists across all messages in the conversation.

Read full explanation

Temperature

Fundamentals

A parameter that controls the randomness of a model's output. Lower values (e.g., 0.0) make responses more deterministic and focused, while higher values (e.g., 1.0) make them more creative and varied.

Read full explanation

Throughput

Deployment

The rate at which an AI model generates output, typically measured in tokens per second. Higher throughput means faster response generation and the ability to serve more concurrent users.

Read full explanation

Tokens

Fundamentals

The basic units that language models use to read and generate text. A token can be a word, part of a word, or a punctuation mark — roughly 3/4 of a word in English on average.

Read full explanation

Top-p (Nucleus Sampling)

Fundamentals

A sampling method that limits token selection to the smallest set of tokens whose cumulative probability exceeds a threshold p. At top-p=0.9, the model considers only the top tokens that together account for 90% of the probability mass.

Read full explanation

Transformer

Architecture

The neural network architecture that underpins virtually all modern large language models. Introduced in the 2017 paper "Attention Is All You Need," it processes text in parallel using self-attention mechanisms.

Read full explanation

Zero-Shot Learning

Training

Asking a model to perform a task using only natural language instructions, without providing any examples. The model relies entirely on its pre-trained knowledge to understand and complete the task.

Read full explanation

Fundamentals9 terms

Core concepts every AI user should understand, from tokens to prompts.

Chain-of-Thought (CoT)

A prompting technique that encourages the model to show its reasoning step by step before arriving at a final answer, significantly improving performance on complex reasoning, math, and logic tasks.

Read full explanation

Context Window

The maximum amount of text (measured in tokens) that a language model can process in a single request, including both the input prompt and the generated output.

Read full explanation

Function Calling

A capability that allows AI models to generate structured outputs that invoke predefined functions or APIs, enabling models to take actions like searching databases, calling web services, or executing code.

Read full explanation

Hallucination

When an AI model generates information that sounds plausible and confident but is factually incorrect, fabricated, or not grounded in the provided context. A major reliability challenge in AI applications.

Read full explanation

Prompt Engineering

The practice of designing and refining the text instructions (prompts) given to an AI model to elicit the most accurate, useful, and relevant responses for a given task.

Read full explanation

System Prompt

A special instruction set provided to the model at the beginning of a conversation that defines its behavior, personality, constraints, and role. It persists across all messages in the conversation.

Read full explanation

Temperature

A parameter that controls the randomness of a model's output. Lower values (e.g., 0.0) make responses more deterministic and focused, while higher values (e.g., 1.0) make them more creative and varied.

Read full explanation

Tokens

The basic units that language models use to read and generate text. A token can be a word, part of a word, or a punctuation mark — roughly 3/4 of a word in English on average.

Read full explanation

Top-p (Nucleus Sampling)

A sampling method that limits token selection to the smallest set of tokens whose cumulative probability exceeds a threshold p. At top-p=0.9, the model considers only the top tokens that together account for 90% of the probability mass.

Read full explanation

Architecture7 terms

How modern AI models are built — transformers, attention, and beyond.

Attention Mechanism

The core component of transformer models that allows each token to dynamically focus on relevant parts of the input sequence. It computes weighted relationships between all tokens, enabling the model to understand context and dependencies.

Read full explanation

Embeddings

Dense numerical vector representations of text (or other data) that capture semantic meaning. Texts with similar meanings have similar embeddings, enabling search, clustering, and retrieval applications.

Read full explanation

Mixture of Experts (MoE)

A model architecture that uses multiple specialized sub-networks ("experts") and a routing mechanism that activates only a subset of them for each input, enabling larger total model capacity without proportional increases in computation.

Read full explanation

Multimodal

AI models that can process and generate multiple types of data — such as text, images, audio, and video — rather than being limited to text alone.

Read full explanation

Parameters

The learned numerical values (weights) within a neural network that determine the model's behavior. More parameters generally means more capacity to store knowledge and handle complex tasks, but also higher computational costs.

Read full explanation

RAG (Retrieval-Augmented Generation)

A technique that enhances AI model responses by retrieving relevant information from external knowledge sources and including it in the prompt, reducing hallucinations and enabling access to up-to-date or private data.

Read full explanation

Transformer

The neural network architecture that underpins virtually all modern large language models. Introduced in the 2017 paper "Attention Is All You Need," it processes text in parallel using self-attention mechanisms.

Read full explanation

Training5 terms

Methods used to teach and improve AI models, including RLHF and fine-tuning.

Distillation

A training technique where a smaller "student" model learns to replicate the behavior of a larger "teacher" model, producing a compact model that retains much of the teacher's capability at lower cost.

Read full explanation

Few-Shot Learning

A prompting technique where you provide a small number of input-output examples within the prompt to guide the model's behavior on a new task, without any additional training.

Read full explanation

Fine-Tuning

The process of further training a pre-trained model on a smaller, task-specific dataset to improve its performance on particular tasks or to adapt it to a specific domain.

Read full explanation

RLHF (Reinforcement Learning from Human Feedback)

A training technique that uses human preferences to fine-tune AI models, making them more helpful, harmless, and honest. Humans rank model outputs, and the model learns to produce responses that align with those preferences.

Read full explanation

Zero-Shot Learning

Asking a model to perform a task using only natural language instructions, without providing any examples. The model relies entirely on its pre-trained knowledge to understand and complete the task.

Read full explanation

Deployment6 terms

Running AI models in production — inference, latency, and optimization.

Guardrails

Safety mechanisms and content filters implemented to prevent AI models from generating harmful, biased, or inappropriate outputs. These include both built-in model alignment and external validation systems.

Read full explanation

Inference

The process of running a trained AI model to generate predictions or outputs from new inputs. This is what happens every time you send a prompt to an AI model and receive a response.

Read full explanation

Latency

The time delay between sending a request to an AI model and receiving the response. Low latency is critical for real-time applications like chatbots and coding assistants.

Read full explanation

Open-Source Models

AI models whose weights are publicly released, allowing anyone to download, run, modify, and fine-tune them. Examples include Meta's Llama, Mistral, and DeepSeek models.

Read full explanation

Quantization

A technique that reduces the precision of a model's numerical weights (e.g., from 16-bit to 4-bit), dramatically decreasing memory usage and increasing inference speed with only a small loss in output quality.

Read full explanation

Throughput

The rate at which an AI model generates output, typically measured in tokens per second. Higher throughput means faster response generation and the ability to serve more concurrent users.

Read full explanation

Benchmarks3 terms

How AI models are evaluated and compared across tasks and capabilities.

Benchmarks

Standardized tests used to measure and compare AI model performance across specific tasks like reasoning, coding, math, and language understanding.

Read full explanation

Frontier Models

The most capable and advanced AI models available at any given time, typically produced by leading AI labs. They represent the current state of the art in performance across reasoning, coding, and general intelligence tasks.

Read full explanation

Reasoning Models

AI models specifically designed or trained to perform extended logical reasoning, mathematical problem-solving, and multi-step analysis. They typically use internal chain-of-thought processing before producing a final answer.

Read full explanation

Pricing1 terms

Understanding the economics of AI — API costs, token pricing, and cost optimization.

API Pricing

The cost structure for using AI models through cloud APIs, typically charged per token processed. Input tokens and output tokens usually have different prices, with output tokens costing more.

Read full explanation

AI Glossary & Learning Hub

All Terms

API Pricing

Attention Mechanism

Benchmarks

Chain-of-Thought (CoT)

Context Window

Distillation

Embeddings

Few-Shot Learning

Fine-Tuning

Frontier Models

Function Calling

Guardrails

Hallucination

Inference

Latency

Mixture of Experts (MoE)

Multimodal

Open-Source Models

Parameters

Prompt Engineering

Quantization

RAG (Retrieval-Augmented Generation)

Reasoning Models

RLHF (Reinforcement Learning from Human Feedback)

System Prompt

Temperature

Throughput

Tokens

Top-p (Nucleus Sampling)

Transformer

Zero-Shot Learning

Chain-of-Thought (CoT)

Context Window

Function Calling

Hallucination

Prompt Engineering

System Prompt

Temperature

Tokens

Top-p (Nucleus Sampling)

Attention Mechanism

Embeddings

Mixture of Experts (MoE)

Multimodal

Parameters

RAG (Retrieval-Augmented Generation)

Transformer

Distillation

Few-Shot Learning

Fine-Tuning

RLHF (Reinforcement Learning from Human Feedback)

Zero-Shot Learning

Guardrails

Inference

Latency

Open-Source Models

Quantization

Throughput

Benchmarks

Frontier Models

Reasoning Models

API Pricing

Ready to Compare Models?