Retrieval-Augmented Generation (RAG)

Also known as: retrieval augmented generation, grounded generation

Retrieval-Augmented Generation (RAG) is a pattern where an AI system retrieves relevant documents from a knowledge source and includes them in the prompt, so the model answers based on grounded, current information rather than training-time memory alone.

Detailed explanation

RAG combines a retrieval system (often a vector database, but also keyword search, structured queries, or hybrids) with a generation model (typically an LLM). At query time, the system searches the knowledge base for documents relevant to the user’s question, then asks the model to answer using those documents as context.

The pattern is used to ground answers in private or up-to-date information, reduce hallucination, enable citations, and avoid fine-tuning. Production RAG systems require careful work on chunking, embedding choice, reranking, prompt design, and continuous evaluation — naive implementations often retrieve irrelevant context and degrade quality rather than improving it.

Common pitfalls include over-relying on similarity search where exact match is needed, ignoring metadata filters, indexing low-quality documents, and missing recency. Mature RAG pipelines include hybrid retrieval, query rewriting, and answer validation against the retrieved passages.

← Back to glossary