AI Glossary & Learning Hub
Clear, jargon-free explanations of the concepts behind modern AI. Whether you are evaluating models, building AI-powered products, or just curious about how it all works — start here.
31 terms across 6 categories
All Terms
Browse all 31 terms alphabetically
API Pricing
PricingThe cost structure for using AI models through cloud APIs, typically charged per token processed. Input tokens and output tokens usually have different prices, with output tokens costing more.
Attention Mechanism
ArchitectureThe core component of transformer models that allows each token to dynamically focus on relevant parts of the input sequence. It computes weighted relationships between all tokens, enabling the model to understand context and dependencies.
Benchmarks
BenchmarksStandardized tests used to measure and compare AI model performance across specific tasks like reasoning, coding, math, and language understanding.
Chain-of-Thought (CoT)
FundamentalsA prompting technique that encourages the model to show its reasoning step by step before arriving at a final answer, significantly improving performance on complex reasoning, math, and logic tasks.
Context Window
FundamentalsThe maximum amount of text (measured in tokens) that a language model can process in a single request, including both the input prompt and the generated output.
Distillation
TrainingA training technique where a smaller "student" model learns to replicate the behavior of a larger "teacher" model, producing a compact model that retains much of the teacher's capability at lower cost.
Embeddings
ArchitectureDense numerical vector representations of text (or other data) that capture semantic meaning. Texts with similar meanings have similar embeddings, enabling search, clustering, and retrieval applications.
Few-Shot Learning
TrainingA prompting technique where you provide a small number of input-output examples within the prompt to guide the model's behavior on a new task, without any additional training.
Fine-Tuning
TrainingThe process of further training a pre-trained model on a smaller, task-specific dataset to improve its performance on particular tasks or to adapt it to a specific domain.
Frontier Models
BenchmarksThe most capable and advanced AI models available at any given time, typically produced by leading AI labs. They represent the current state of the art in performance across reasoning, coding, and general intelligence tasks.
Function Calling
FundamentalsA capability that allows AI models to generate structured outputs that invoke predefined functions or APIs, enabling models to take actions like searching databases, calling web services, or executing code.
Guardrails
DeploymentSafety mechanisms and content filters implemented to prevent AI models from generating harmful, biased, or inappropriate outputs. These include both built-in model alignment and external validation systems.
Hallucination
FundamentalsWhen an AI model generates information that sounds plausible and confident but is factually incorrect, fabricated, or not grounded in the provided context. A major reliability challenge in AI applications.
Inference
DeploymentThe process of running a trained AI model to generate predictions or outputs from new inputs. This is what happens every time you send a prompt to an AI model and receive a response.
Latency
DeploymentThe time delay between sending a request to an AI model and receiving the response. Low latency is critical for real-time applications like chatbots and coding assistants.
Mixture of Experts (MoE)
ArchitectureA model architecture that uses multiple specialized sub-networks ("experts") and a routing mechanism that activates only a subset of them for each input, enabling larger total model capacity without proportional increases in computation.
Multimodal
ArchitectureAI models that can process and generate multiple types of data — such as text, images, audio, and video — rather than being limited to text alone.
Open-Source Models
DeploymentAI models whose weights are publicly released, allowing anyone to download, run, modify, and fine-tune them. Examples include Meta's Llama, Mistral, and DeepSeek models.
Parameters
ArchitectureThe learned numerical values (weights) within a neural network that determine the model's behavior. More parameters generally means more capacity to store knowledge and handle complex tasks, but also higher computational costs.
Prompt Engineering
FundamentalsThe practice of designing and refining the text instructions (prompts) given to an AI model to elicit the most accurate, useful, and relevant responses for a given task.
Quantization
DeploymentA technique that reduces the precision of a model's numerical weights (e.g., from 16-bit to 4-bit), dramatically decreasing memory usage and increasing inference speed with only a small loss in output quality.
RAG (Retrieval-Augmented Generation)
ArchitectureA technique that enhances AI model responses by retrieving relevant information from external knowledge sources and including it in the prompt, reducing hallucinations and enabling access to up-to-date or private data.
Reasoning Models
BenchmarksAI models specifically designed or trained to perform extended logical reasoning, mathematical problem-solving, and multi-step analysis. They typically use internal chain-of-thought processing before producing a final answer.
RLHF (Reinforcement Learning from Human Feedback)
TrainingA training technique that uses human preferences to fine-tune AI models, making them more helpful, harmless, and honest. Humans rank model outputs, and the model learns to produce responses that align with those preferences.
System Prompt
FundamentalsA special instruction set provided to the model at the beginning of a conversation that defines its behavior, personality, constraints, and role. It persists across all messages in the conversation.
Temperature
FundamentalsA parameter that controls the randomness of a model's output. Lower values (e.g., 0.0) make responses more deterministic and focused, while higher values (e.g., 1.0) make them more creative and varied.
Throughput
DeploymentThe rate at which an AI model generates output, typically measured in tokens per second. Higher throughput means faster response generation and the ability to serve more concurrent users.
Tokens
FundamentalsThe basic units that language models use to read and generate text. A token can be a word, part of a word, or a punctuation mark — roughly 3/4 of a word in English on average.
Top-p (Nucleus Sampling)
FundamentalsA sampling method that limits token selection to the smallest set of tokens whose cumulative probability exceeds a threshold p. At top-p=0.9, the model considers only the top tokens that together account for 90% of the probability mass.
Transformer
ArchitectureThe neural network architecture that underpins virtually all modern large language models. Introduced in the 2017 paper "Attention Is All You Need," it processes text in parallel using self-attention mechanisms.
Zero-Shot Learning
TrainingAsking a model to perform a task using only natural language instructions, without providing any examples. The model relies entirely on its pre-trained knowledge to understand and complete the task.
Core concepts every AI user should understand, from tokens to prompts.
Chain-of-Thought (CoT)
A prompting technique that encourages the model to show its reasoning step by step before arriving at a final answer, significantly improving performance on complex reasoning, math, and logic tasks.
Context Window
The maximum amount of text (measured in tokens) that a language model can process in a single request, including both the input prompt and the generated output.
Function Calling
A capability that allows AI models to generate structured outputs that invoke predefined functions or APIs, enabling models to take actions like searching databases, calling web services, or executing code.
Hallucination
When an AI model generates information that sounds plausible and confident but is factually incorrect, fabricated, or not grounded in the provided context. A major reliability challenge in AI applications.
Prompt Engineering
The practice of designing and refining the text instructions (prompts) given to an AI model to elicit the most accurate, useful, and relevant responses for a given task.
System Prompt
A special instruction set provided to the model at the beginning of a conversation that defines its behavior, personality, constraints, and role. It persists across all messages in the conversation.
Temperature
A parameter that controls the randomness of a model's output. Lower values (e.g., 0.0) make responses more deterministic and focused, while higher values (e.g., 1.0) make them more creative and varied.
Tokens
The basic units that language models use to read and generate text. A token can be a word, part of a word, or a punctuation mark — roughly 3/4 of a word in English on average.
Top-p (Nucleus Sampling)
A sampling method that limits token selection to the smallest set of tokens whose cumulative probability exceeds a threshold p. At top-p=0.9, the model considers only the top tokens that together account for 90% of the probability mass.
How modern AI models are built — transformers, attention, and beyond.
Attention Mechanism
The core component of transformer models that allows each token to dynamically focus on relevant parts of the input sequence. It computes weighted relationships between all tokens, enabling the model to understand context and dependencies.
Embeddings
Dense numerical vector representations of text (or other data) that capture semantic meaning. Texts with similar meanings have similar embeddings, enabling search, clustering, and retrieval applications.
Mixture of Experts (MoE)
A model architecture that uses multiple specialized sub-networks ("experts") and a routing mechanism that activates only a subset of them for each input, enabling larger total model capacity without proportional increases in computation.
Multimodal
AI models that can process and generate multiple types of data — such as text, images, audio, and video — rather than being limited to text alone.
Parameters
The learned numerical values (weights) within a neural network that determine the model's behavior. More parameters generally means more capacity to store knowledge and handle complex tasks, but also higher computational costs.
RAG (Retrieval-Augmented Generation)
A technique that enhances AI model responses by retrieving relevant information from external knowledge sources and including it in the prompt, reducing hallucinations and enabling access to up-to-date or private data.
Transformer
The neural network architecture that underpins virtually all modern large language models. Introduced in the 2017 paper "Attention Is All You Need," it processes text in parallel using self-attention mechanisms.
Methods used to teach and improve AI models, including RLHF and fine-tuning.
Distillation
A training technique where a smaller "student" model learns to replicate the behavior of a larger "teacher" model, producing a compact model that retains much of the teacher's capability at lower cost.
Few-Shot Learning
A prompting technique where you provide a small number of input-output examples within the prompt to guide the model's behavior on a new task, without any additional training.
Fine-Tuning
The process of further training a pre-trained model on a smaller, task-specific dataset to improve its performance on particular tasks or to adapt it to a specific domain.
RLHF (Reinforcement Learning from Human Feedback)
A training technique that uses human preferences to fine-tune AI models, making them more helpful, harmless, and honest. Humans rank model outputs, and the model learns to produce responses that align with those preferences.
Zero-Shot Learning
Asking a model to perform a task using only natural language instructions, without providing any examples. The model relies entirely on its pre-trained knowledge to understand and complete the task.
Running AI models in production — inference, latency, and optimization.
Guardrails
Safety mechanisms and content filters implemented to prevent AI models from generating harmful, biased, or inappropriate outputs. These include both built-in model alignment and external validation systems.
Inference
The process of running a trained AI model to generate predictions or outputs from new inputs. This is what happens every time you send a prompt to an AI model and receive a response.
Latency
The time delay between sending a request to an AI model and receiving the response. Low latency is critical for real-time applications like chatbots and coding assistants.
Open-Source Models
AI models whose weights are publicly released, allowing anyone to download, run, modify, and fine-tune them. Examples include Meta's Llama, Mistral, and DeepSeek models.
Quantization
A technique that reduces the precision of a model's numerical weights (e.g., from 16-bit to 4-bit), dramatically decreasing memory usage and increasing inference speed with only a small loss in output quality.
Throughput
The rate at which an AI model generates output, typically measured in tokens per second. Higher throughput means faster response generation and the ability to serve more concurrent users.
How AI models are evaluated and compared across tasks and capabilities.
Benchmarks
Standardized tests used to measure and compare AI model performance across specific tasks like reasoning, coding, math, and language understanding.
Frontier Models
The most capable and advanced AI models available at any given time, typically produced by leading AI labs. They represent the current state of the art in performance across reasoning, coding, and general intelligence tasks.
Reasoning Models
AI models specifically designed or trained to perform extended logical reasoning, mathematical problem-solving, and multi-step analysis. They typically use internal chain-of-thought processing before producing a final answer.
Understanding the economics of AI — API costs, token pricing, and cost optimization.
Ready to Compare Models?
Now that you understand the key concepts, explore real AI models with benchmarks, pricing, and detailed specs.