Skip to content

Context Engineering Series: Building Better Agentic RAG Systems

I've been helping companies build agentic RAG systems and studying coding agents from Cognition, Claude Code, Cursor, and others. These coding agents are probably unlocking a trillion-dollar industry—making them the most economically viable agents to date.

This series shares what I've learned from these teams and conversations with professional developers using these systems daily, exploring what we can apply to other industries.

What is Context Engineering?

We've moved far beyond prompt engineering. Now we're designing portfolios of tools (directory listing, file editing, web search), slash commands like /pr-create that inject prompts vs , specialized sub-agents @pr-creation-agent, vs having an AGENT.md with systems that work across IDEs, command lines, GitHub, and Slack.

Context engineering is designing tool responses and interaction patterns that give agents situational awareness to navigate complex information spaces effectively.

To understand what this means practically, let's look at how systems have evolved:

Before: We precomputed what chunks needed to be put into context, injected them, and then asked the system to reason about the chunks. Search was a one-shot operation—you got your top-k results and that was it.

def search(query: str, n_chunks: int = 10) -> list[str]:
    """Return raw text chunks, no metadata."""
    chunks = vector_search(query, top_k=n_chunks)
    return [chunk.text for chunk in chunks]

# One-shot RAG pattern
def answer_question(question: str) -> str:
    chunks = search(question)
    context = "\n".join(chunks)

    response = client.chat.completions.create(
        messages=[
            {"role": "system", "content": "You are a helpful assistant that can answer questions about the context."},
            {"role": "user", "content": f"Context: {context}\n\nQuestion: {question}"}
        ]
    )
    return response.choices[0].message.content

Now: Agents are incredibly easy to build because all you need is a messages array and a bunch of tools. They're persistent, make multiple tool calls, and build understanding across conversations. They don't just need the right chunk—they need to understand the landscape of available information.

from pydantic import BaseModel

class SearchTool(BaseModel):
    """Search with metadata and facets for strategic exploration."""
    query: str
    n_chunks: int = 10

    def run(self) -> dict:
        results = vector_search(self.query, top_k=self.n_chunks)
        return {
            chunk.id: chunk.text for chunk in results
        }

# Agent loop - persistent conversation with tool execution
messages = [
    {"role": "system", "content": "Use search strategically to explore information landscapes."}
]

while True:
    response = client.chat.completions.create(
        messages=messages,
        tools=[SearchTool]
    )

    if response.tool_calls:
        # Execute tools and add results to conversation
        for tool_call in response.tool_calls:
            result = SearchTool(**tool_call.args).run()
            messages.append({"role": "tool", "content": str(result), "tool_call_id": tool_call.id})
    else:
        # Agent has final answer
        break

How Easy Are Agents to Build?

Coding agents have become remarkably simple to implement. Check out How to Build an Agent - if you give a coding agent literally just this blog post, it will write a coding agent for you. The barrier to entry has never been lower.

The fundamental shift is this: agents don't just consume information, they explore information spaces. Context engineering is about designing systems that support this exploration—giving agents not just the right data, but the right understanding of what data exists and how to navigate it.

What This Series Covers

This series explores practical approaches to context engineering across different domains and use cases. The focus is on implementation strategies, real-world examples, and measurable business outcomes from companies making this transition.

Topics include:

  • Moving beyond chunks to structured information landscapes
  • Multi-level response architectures that provide navigational context
  • Agent-friendly data organization patterns
  • Performance optimization for agentic workloads
  • Business metrics and ROI measurement strategies

Posts in This Series

1. Beyond Chunks: Context Engineering Tool Response

Core thesis: Agent success depends on tool response structure, not just content. Shows how faceted search and metadata give agents "peripheral vision" of data landscapes, enabling strategic exploration beyond top-k similarity results.

Key insight: Tool responses become prompt engineering—XML structure and system instructions in tool outputs directly influence how agents think about subsequent searches.

2. Slash Commands vs Subagents: How to Keep AI Tools Focused

Core thesis: Context pollution is killing agent performance, but subagent architecture solves it. Bad context is cheap but toxic—100k lines of logs cost nothing computationally but destroy valuable reasoning context.

Key insight: Same diagnostic capability, dramatically different economics: slash commands flood main threads with 91% noise, subagents burn tokens off-thread and return 8x cleaner context with 76% signal.

3. Two Experiments We Need to Run on AI Agent Compaction

Core thesis: If in-context learning is gradient descent, then compaction is momentum. We can use compaction as both an optimization technique and a lens for understanding agent behavior at scale.

Key insight: Compaction timing affects learning trajectory preservation, and specialized compaction prompts can reveal systematic patterns in agent failures and successes across populations.

Start Here: If you're new to context engineering, begin with the foundational post above.

Who This Series Is For

  • Engineering teams building agentic RAG systems
  • Product leaders evaluating the ROI of agent implementations
  • AI researchers interested in tool design and agent cognition
  • Anyone curious about how agents actually work with structured data

Each post includes practical code examples, implementation strategies, and real business metrics from companies that have made this transition.

Getting Started

Start with the foundational post Beyond Chunks: Why Context Engineering is the Future of RAG to understand the core thesis and four-level framework. From there, you can either read sequentially or jump to specific topics based on your current implementation needs.

The future of RAG isn't about better embeddings or larger context windows—it's about teaching agents to navigate information spaces systematically. Let's explore how to build that future together.