GenAI Patterns

A vendor-neutral catalog of design patterns for building with large language models. Browse individual patterns or follow guides to compose them into production systems.

Browse Patterns Download Free Book

RAG

7 patterns

View all →

Retrieval-augmented generation patterns for grounding LLM output in external knowledge.

RAGadvanced

Agentic RAG

Give an AI agent control over when, where, and how to retrieve information rather than using a fixed retrieval pipeline.

RAGbeginner

Basic RAG

Ground LLM responses in external knowledge by retrieving relevant documents before generation to reduce hallucinations and stay current.

RAGadvanced

Deep Search

Answer complex multi-hop questions through iterative cycles of retrieval, reasoning, and gap analysis across multiple sources.

RAGadvanced

Grounded Generation

Build trust in RAG outputs through inline citations, out-of-domain detection, and self-correcting retrieval strategies that reduce hallucinations.

RAGintermediate

Hybrid Retrieval

Bridge the vocabulary gap between user queries and knowledge base content using hypothetical answers, query expansion, and hybrid search.

RAGintermediate

Retrieval Refinement

Improve retrieval quality by reranking, compressing, and filtering retrieved chunks between the vector search step and LLM generation.

RAGintermediate

Semantic Indexing

Replace keyword matching with vector embeddings to find documents by meaning rather than exact words, enabling semantic similarity search.

Agents

5 patterns

View all →

Autonomous systems that plan, use tools, execute code, and coordinate with other agents.

Agentsintermediate

Code Execution

Let LLMs generate and execute code in sandboxed environments for tasks requiring computational precision like data analysis and visualization.

Agentsadvanced

Multi-Agent Collaboration

Coordinate multiple specialized agents to solve complex tasks that exceed any single agent's capabilities using supervisor or peer topologies.

Agentsadvanced

Plan and Execute

Separate strategic planning from tactical execution by having one agent plan and another execute each step for more structured workflows.

Agentsintermediate

ReAct Loop

Interleave reasoning and action in a loop where the agent thinks, acts, observes, and repeats until the task is complete.

Agentsintermediate

Tool Calling

Let LLMs interact with external systems by emitting structured function calls that your code executes safely on their behalf.

Prompting

5 patterns

View all →

Techniques for structuring model inputs to get better reasoning, consistency, and output quality.

Promptingbeginner

Chain-of-Thought

Prompt models to show their reasoning step by step to improve accuracy on multi-step problems like math, logic, and complex analysis.

Promptingbeginner

Few-Shot Prompting

Include input-output examples in your prompt so the model learns the expected format, tone, and behavior by demonstration.

Promptingintermediate

Prompt Chaining

Break complex tasks into a sequence of focused prompts where each step's output feeds into the next for more reliable multi-step results.

Promptingadvanced

Prompt Optimization

Automatically optimize prompts against evaluation datasets instead of relying on manual trial-and-error tuning of instructions.

Promptingintermediate

Self-Consistency

Generate multiple reasoning paths and take the majority answer to reduce errors from stochastic generation and improve reliability.

Routing & Orchestration

3 patterns

View all →

Directing requests to the right model, chain, or agent based on intent and constraints.

Routing & Orchestrationintermediate

Cascading

Try cheaper models first and escalate to more capable ones only when confidence is low, reducing costs while maintaining output quality.

Routing & Orchestrationintermediate

Model Router

Route queries to the right model tier based on estimated complexity to optimize cost without sacrificing quality on harder tasks.

Routing & Orchestrationintermediate

Semantic Router

Classify query intent using embeddings and route to the appropriate handler, tool, or agent pipeline without relying on keyword rules.

Safety & Guardrails

2 patterns

View all →

Protecting systems from harmful inputs, hallucinated outputs, and policy violations.

Safety & Guardrailsintermediate

Guardrails

Insert safety layers at input, output, retrieval, and execution points to enforce content policies, prevent harm, and block prompt injection.

Safety & Guardrailsadvanced

Self-Check

Detect potential hallucinations by analyzing token probabilities and confidence scores in LLM outputs before they reach the user.

Evaluation

2 patterns

View all →

Measuring and improving the quality of LLM outputs through automated and human feedback.

Evaluationintermediate

LLM-as-Judge

Use an LLM with a custom scoring rubric to evaluate open-ended outputs at scale, replacing expensive human review with consistent automated grading.

Evaluationintermediate

Reflection

Improve LLM outputs through iterative generate-evaluate-critique-regenerate loops that refine quality without retraining the model.

Cost & Performance

3 patterns

View all →

Operating LLM systems efficiently through caching, model selection, and inference optimization.

Cost & Performanceadvanced

Inference Optimization

Maximize inference throughput through batching, KV cache optimization, and model parallelism to reduce latency and serve more requests per GPU.

Cost & Performanceintermediate

Prompt Caching

Reuse responses for repeated or similar prompts through semantic and prefix caching strategies to cut latency and reduce API costs.

Cost & Performanceadvanced

Small Language Models

Reduce model size through distillation, quantization, or speculative decoding while preserving quality for cost-efficient deployment.

Memory & State

2 patterns

View all →

Maintaining context across conversations and sessions beyond the context window.

Memory & Statebeginner

Conversation Memory

Manage conversation state across turns using sliding windows, summaries, or entity tracking strategies to maintain coherent multi-turn dialogue.

Memory & Stateintermediate

Long-Term Memory

Persist important facts and preferences in external memory stores and retrieve them to maintain continuity and personalization across sessions.

Composition Guides

View all →

Guideadvanced

Building a Production Search Pipeline

Compose RAG patterns into a search pipeline that handles real traffic with trustworthy, grounded answers.

basic-ragsemantic-indexing

Patterns

Guides