
Artificial intelligence (AI) has its own dialect, and it evolves fast. Whether you're an executive deciding on a vendor, a developer integrating your first LLM, or a marketer trying to understand what your engineering team is building, a shared vocabulary is the first step to moving quickly and avoiding expensive misunderstandings. The 30 terms below cover the full stack: foundational ML concepts, language model mechanics, agent architecture, and safety evaluation. Bookmark this page and revisit it as the field keeps moving. To see how we apply these ideas in practice, explore Autonomous Agent AI.
The Basics
These six terms form the bedrock of every AI conversation. If you're already experienced, check that your definitions match the ones your stakeholders are using, and then read our article describing what ai agents are.
Artificial Intelligence (AI)
The broad field of computer science dedicated to building systems that can perform tasks normally requiring human intelligence, such as reasoning, learning, perception, and language understanding.
Machine Learning (ML)
A subset of AI in which models learn patterns from data and improve their performance on a task without being explicitly programmed for each scenario. ML powers everything from spam filters to fraud detection.
Deep Learning
A sub-field of ML that uses neural networks with many layers to learn complex, hierarchical representations of data. Deep learning is the engine behind image recognition, speech synthesis, and large language models.
Neural Network
A computational architecture loosely inspired by the brain. It consists of nodes (neurons) organised in layers that progressively transform raw inputs into meaningful outputs through learned weights.
Training Data
The labelled or unlabelled dataset used to teach a model by exposing it to examples. The quality, diversity, and volume of training data directly determine a model's capabilities and its blind spots.
Algorithm
A step-by-step computational procedure that transforms input data into an output. Algorithms range from simple sorting rules to the optimisation procedures that adjust millions of model parameters during training.

Language & Generation
Large language models have become the most visible face of AI. These six terms explain how they work, what governs their behaviour, and where they go wrong.
Large Language Model (LLM)
A neural network trained on vast text corpora that can understand and generate human language at a sophisticated level. Examples include GPT-4o, Claude 3, and Gemini. LLMs are the reasoning core of most modern AI products.
Natural Language Processing (NLP)
The field of AI focused on enabling computers to read, understand, classify, and generate human language. NLP underpins translation, sentiment analysis, summarisation, and conversational interfaces.
Prompt
The instruction or question you give an LLM to elicit a response. Crafting effective prompts (known as prompt engineering) is a critical skill. System prompts set the model's persona and constraints; user prompts convey the immediate task.
Token
The smallest unit of text an LLM processes, roughly a word fragment. The word "unbelievable" might be split into three tokens. Pricing, speed, and context limits are all measured in tokens, not characters or words.
Context Window
The maximum amount of text (measured in tokens) an LLM can consider in a single interaction. Larger windows (some models now exceed one million tokens) enable longer conversations, bigger documents, and richer agent memory.
Hallucination
When an LLM generates plausible-sounding but factually incorrect or fabricated information, often stated with false confidence. Grounding, RAG, and evaluation harnesses are the primary defences against hallucinations in production systems.

Model Architecture & Data
Behind every impressive demo is a set of deliberate architectural choices about how models are trained, stored, and queried. These six terms explain the machinery that turns raw data into reliable answers.
Fine-Tuning
Re-training a pre-trained model on a narrower, domain-specific dataset to specialise its behaviour. Fine-tuning a general LLM on medical records, for instance, can dramatically improve its accuracy in clinical contexts without training from scratch.
Embedding
A numerical vector that captures the semantic meaning of text, images, or other data in a format models can process mathematically. Semantically similar items cluster close together in embedding space, enabling similarity search and recommendation.
Vector Database
A database optimised for storing and searching embeddings by semantic similarity rather than exact keyword matches. Popular options include Pinecone, Weaviate, Chroma, and the pgvector extension for PostgreSQL.
Retrieval-Augmented Generation (RAG)
An architecture that pairs LLM generation with real-time retrieval of relevant documents from a knowledge base. The retrieved context is injected into the prompt, grounding the model's response in verified information and reducing hallucinations.
Inference
Running a trained model on new inputs to produce outputs. It's the production-time counterpart to training. Inference cost, latency, and throughput are the primary engineering concerns when scaling AI features to real users.
Foundation Model
A large model trained on broad data at scale that can be adapted to many downstream tasks, often through prompting, fine-tuning, or tool integration. GPT-4o, Llama 3, and Claude 3 are prominent examples.

Agents & Automation
AI agents are where language models meet real-world action. These six terms describe how autonomous systems are structured, coordinated, and kept under meaningful human control. For a deeper look at how we build them, see our approach to AI consulting.
AI Agent
Software that perceives its environment, reasons about a goal, selects tools or actions to take, and adapts its behaviour based on feedback, all in a loop until the task is complete or a human intervenes.
Agentic Workflow
A sequence of reasoning and action steps executed by an AI agent to complete a multi-step task autonomously. Unlike single-shot LLM calls, agentic workflows loop, branch, and recover from errors without human hand-holding at each step.
Tool Calling (Function Calling)
The ability for an LLM to invoke external APIs, databases, code interpreters, or scripts as part of completing a task. The model decides when to call a tool and what to pass it, then incorporates the result back into its reasoning.
Orchestration
The coordination layer that manages the sequence of LLM calls, tool invocations, memory reads/writes, and error handling in an agentic system. Frameworks like LangChain, LlamaIndex, and Semantic Kernel provide orchestration primitives.
Multi-Agent System
An architecture in which multiple AI agents collaborate, each specialised for a sub-task: one agent researches, another drafts, a third reviews. Agents communicate via shared context or message queues to solve problems too complex for a single agent.
Human-in-the-Loop (HITL)
A design pattern where human reviewers approve, override, or redirect AI actions at critical decision points. HITL is essential for high-stakes workflows (financial transactions, medical triage, legal review) where errors have significant consequences.

Safety & Evaluation
Deploying AI responsibly means more than building something that works in a demo. These six terms cover the guardrails, measurement practices, and configuration knobs that separate reliable production systems from costly failures.
Guardrails
Rules, filters, and constraints that prevent an AI system from producing harmful, non-compliant, off-topic, or otherwise undesirable outputs. Guardrails operate on inputs (blocking bad prompts) and outputs (filtering bad responses), and are essential for regulated industries.
Grounding
Tethering LLM outputs to verified facts, authoritative documents, or real-time data sources rather than relying solely on the model's parametric knowledge. RAG is the most common grounding technique; citation requirements and tool calls are others.
Explainability (XAI)
The degree to which an AI system's reasoning process can be understood and communicated to human stakeholders. Regulators increasingly require explainability for automated decisions in lending, hiring, and healthcare, making XAI a compliance concern, not just an engineering one.
Bias
Systematic errors in model outputs arising from imbalances, historical prejudices, or under-representation in training data. AI bias can perpetuate or amplify unfair outcomes, making bias auditing a critical step before deploying models that affect people's lives or opportunities.
Evals (Evaluation)
Automated or human-judged tests used to measure a model's or agent's quality, accuracy, and safety on defined criteria. Teams that invest in rigorous evals ship faster because they can detect regressions immediately, before users do.
Temperature
A sampling parameter that controls the randomness of LLM outputs. Low temperature (near 0) produces focused, deterministic, repetitive responses, ideal for factual Q&A or code generation. High temperature (near 1 or above) introduces creative variability, useful for brainstorming and copywriting.
Putting It Together
These 30 terms are not isolated facts. They form a connected vocabulary. A foundation model is trained on broad data, then fine-tuned for your domain. It processes input as tokens within a context window, uses embeddings stored in a vector database to power RAG, and surfaces answers through inference. Wrap that in an AI agent with tool calling and orchestration, add guardrails and a human-in-the-loop checkpoint, and you have the rough shape of a production AI system.
Understanding how these pieces fit lets you ask better questions, make smarter build-vs-buy decisions, and hold vendors accountable. If you'd like to explore any of these concepts in depth or discuss how they apply to your business, reach out to the Autonomous Agent AI team. We help organizations move from glossary to go-live.
Sources & Further Reading
- Anthropic. "Claude's Model Specification." 2024. anthropic.com
- OpenAI. "GPT-4 Technical Report." 2024. cdn.openai.com
- Lewis, P. et al. "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks." NeurIPS 2020. arxiv.org/abs/2005.11401
- Stanford HAI. "The AI Index Report 2024." hai.stanford.edu
- Google DeepMind. "Visualising AI" image collection via Pexels. pexels.com/@googledeepmind
