Context Engineering
Also known as: context management, context design, context window management
The practice of deliberately managing what goes into an LLM's context window — prompts, retrieved chunks, history, and tool results — to optimize model performance.
Definition
Context engineering is the discipline of systematically controlling and optimizing the inputs placed into a language model's context window — including system prompts, retrieved documents, conversation history, tool results, and user instructions — to achieve reliable, high-quality outputs. While prompt engineering focuses on the phrasing and structure of individual instructions, context engineering addresses the broader orchestration question: what information to include, exclude, compress, or retrieve at each inference step.
In production agentic systems, context engineering is often the primary lever for improving model behavior without changing the model itself. Key concerns include managing token budgets across multi-turn conversations, selecting the most relevant retrieved chunks for RAG pipelines, structuring chat history to preserve key facts while compressing stale context, and sequencing tool results to maximize downstream reasoning quality.
Effective context engineering requires understanding how different model architectures handle positional information (the 'lost in the middle' effect, attention decay near context boundaries), how models weight system versus user versus assistant turns, and how to exploit prompt caching for repeated prefixes to reduce latency and cost. As context windows have grown to 1M+ tokens, the discipline has shifted from fitting everything in to selecting what matters most — an architectural and product design skill as much as a prompting skill.