Skip to content

2024

Decomposing RAG Systems to Identify Bottlenecks

There's a reason Google has separate interfaces for Maps, Images, News, and Shopping. The same reason explains why many RAG systems today are hitting a performance ceiling. After working with dozens of companies implementing RAG, I've discovered that most teams focus on optimizing embeddings while missing two fundamental dimensions that matter far more: Topics and Capabilities.

If you're interested in learning more about how to systematically improve RAG systems, you can sign up for the free email course here:

Sign up for the Free Email Course

Now, let's dive into some frequently asked questions from the course:

A Tale of Unmet Expectations

Let me share a recent case study that illustrates this perfectly:

A construction company implemented a state-of-the-art RAG system for their technical documentation. Despite using the latest embedding models and spending weeks optimizing their prompts, user satisfaction stayed stubbornly around 50%. When we analyzed their query logs, we discovered something fascinating: 20% of queries were simply counting objects in blueprints ("How many doors are on the 15th floor?").

No amount of embedding optimization would help here. By adding a simple object detection model for blueprints, satisfaction for those queries jumped to 87% in just one week.

The Two Dimensions That Actually Matter

After analyzing millions of queries across various industries, I've identified two fundamental dimensions that determine RAG system success:

  1. Topics: Do we have the information users want?
  2. Capabilities: Can we effectively access and process that information?

Let's dive deep into each dimension.

Topics: Content Coverage

Topics represent your system's knowledge inventory. Think of this like a store's product catalog - you can't sell what you don't have.

Examples of Topic Gaps:

  • Missing documentation sections
  • Lack of data for specific time periods
  • Absence of particular use cases or scenarios
  • Missing specific types of content (images, videos, tables)

Real World Example: Netflix

When Netflix notices users searching for "Adam Sandler basketball movies", that's a topic gap - they simply don't have that content. No amount of better search or recommendations will help if the content doesn't exist.

Capabilities: Processing Power

Capabilities represent your system's ability to manipulate and retrieve information in specific ways. This is where most RAG systems fall short.

Common Capability Requirements:

  1. Temporal Understanding
  2. "What changed last week?"
  3. "Show me the latest updates"
  4. Understanding fiscal vs calendar years

  5. Numerical Processing

  6. Counting objects in documents
  7. Calculating trends or changes
  8. Aggregating data across sources

  9. Entity Resolution

  10. Connecting related documents
  11. Understanding document hierarchies
  12. Mapping aliases and references

Real World Example: DoorDash

When DoorDash notices orders dropping after 9 PM, adding more restaurants won't help. They need a capability to filter for "open now" restaurants. No amount of inventory helps if users can't find what's actually available.

The Impact on User Experience

Consider how these dimensions affect real user interactions:

  1. Topic Failures:
  2. "Zero results found"
  3. Completely irrelevant responses
  4. Missing critical information

  5. Capability Failures:

  6. Partially correct answers
  7. Unable to process time-based queries
  8. Can't compare or contrast information

Building a Systematic Approach

Here's how to implement this framework in your RAG system:

  1. Analyze Query Patterns
  2. Categorize failed queries into topic vs capability gaps
  3. Identify clusters of similar issues
  4. Track frequency and impact of each gap

  5. Measure Impact

  6. Query volume (how often does this come up?)
  7. Success rate (how often do we fail?)
  8. Business impact (what does failing cost us?)

  9. Prioritize Improvements

  10. Focus on high-volume, low-success-rate queries
  11. Balance implementation cost against potential impact
  12. Build capabilities that can be reused across topics

Best Practices for Implementation

  1. Start with Data Collection
  2. Log all queries and their success rates
  3. Track which capabilities are used for each query
  4. Monitor topic coverage over time

  5. Build Modular Systems

  6. Separate topic management from capability implementation
  7. Allow for easy addition of new capabilities
  8. Enable A/B testing of different approaches

  9. Measure Everything

  10. Track success rates by topic and capability
  11. Monitor usage patterns of different capabilities
  12. Calculate ROI of topic expansions

Looking Forward

The future of RAG isn't just about better embeddings or larger context windows. It's about: - Building specialized indices for different query types - Developing robust capability routing systems - Creating feedback loops for continuous improvement

Conclusion

Stop focusing solely on embedding optimization. Start analyzing your queries through the lens of topics and capabilities. This framework will help you: - Identify the real bottlenecks in your system - Make strategic decisions about improvements - Build a more effective and scalable RAG application

Remember: The goal isn't to build a perfect system. It's to build a system that gets better every day at solving real user problems.


If you're working on a RAG system right now, try this: Take your last 20 failed queries and sort them into topic vs capability issues. You might be surprised by what patterns emerge.

If you're interested in learning more about how to systematically improve RAG systems, you can sign up for the free email course here:

Sign up for the Free Email Course

Those Who Can Do, Must Teach: Why Teaching Makes You Better

"Those who can't do, teach" is wrong. Here's proof: I taught at the Data Science Club while learning myself. If I help bring a room of 60 people even 1 week ahead, in an hour, that's 60 weeks of learning value creation. That's more than a year of value from one hour. Teaching isn't what you do when you can't perform. It's how you multiply your impact.

Its a duty.

What is Retrieval Augmented Generation?

Retrieval augmented generation (RAG) is a technique that enhances the capabilities of large language models (LLMs) by integrating them with external knowledge sources. In essence, RAG combines the generative power of LLMs with the vast information stored in databases, documents, and other repositories. This approach enables LLMs to generate more accurate, relevant, and contextually grounded responses.

How to Improve RAG Applications; 6 Proven Strategies

This article explains six proven strategies to improve Retrieval-Augmented Generation (RAG) systems. It builds on my previous articles and consulting experience helping companies enhance their RAG applications.

In RAG is More Than Just Embeddings, I explain how RAG goes beyond vector embeddings. I also wrote How to Build a Terrible RAG System, which shows what not to do - helping you learn good practices through inverted thinking.

For a deeper understanding of RAG complexity, check out Levels of RAG Complexity. This article breaks down RAG into manageable components. If you want quick wins, read Low Hanging Fruit in RAG.

I've also written about the future of RAG in Predictions for the Future of RAG, exploring how RAG may evolve into report generation.

These articles work together to provide a comprehensive guide on RAG systems. They offer practical tips for developers and organizations looking to improve their implementations.

By the end of this post, you'll understand six key strategies I've found effective when improving RAG applications:

  • Building a data flywheel with synthetic testing
  • Implementing structured query segmentation
  • Developing specialized search indices
  • Mastering query routing and tool selection
  • Leveraging metadata effectively
  • Creating robust feedback loops

Let's explore each of these strategies in detail and see how they can help you build better RAG systems.

How to Get Started in AI Consulting:

Picture this: You're sitting at your desk, contemplating the leap into AI consulting. Maybe you're a seasoned ML engineer looking to transition from contractor to consultant, or perhaps you've been building AI products and want to branch out independently. Whatever brought you here, you're wondering how to transform your technical expertise into a thriving consulting practice.

Consulting writing

I want to share something that completely changed my consulting business: writing consistently.

Last month, a founder reached out saying, "I don't know who you are, but your blog posts keep showing up in our team's Slack. Are you available to help us?"

Two days later, we closed a $140,000 deal (for a 3-month project). Only 3 sales calls were needed.

This wasn't luck – it was the compound effect of putting words on the page every single day.

Who am I?

In the next year, this blog will be painted with a mix of technical machine learning content and personal notes. I've spent more of my 20s thinking about my life than machine learning. I'm not good at either, but I enjoy both.

Life story

I was born in a village in China. My parents were the children of rural farmers who grew up during the Cultural Revolution. They were the first generation of their family to read and write, and also the first generation to leave the village.

How to Lead AI Engineering Teams

Have you ever wondered why some teams seem to effortlessly deliver value while others stay busy but make no real progress?

I recently had a conversation that completely changed how I think about leading teams. While discussing team performance with a VP of Engineering who was frustrated with their team's slow progress, I suggested focusing on better standups and more experiments.

That's when Skylar Payne dropped a truth bomb that made me completely rethink everything:

"Leaders are living and breathing the business strategy through their meetings and context, but the people on the ground don't have any fucking clue what that is. They're kind of trying to read the tea leaves to understand what it is."

That moment was a wake-up call.

I had been so focused on the mechanics of execution that I'd missed something fundamental: The best processes in the world won't help if your team doesn't understand how their work drives real value.

In less than an hour, I learned more about effective leadership than I had in the past year. Let me share what I discovered.

The Process Trap

For years, I believed the answer to team performance was better processes. More standups, better ticket tracking, clearer KPIs.

I was dead wrong.

Here's the truth that surprised me: The most effective teams have very little process. What they do have is: - Crystal clear alignment on what matters - A shared understanding of how the business works - The ability to make independent decisions - A systematic way to learn and improve

Let me break down how to build this kind of team.

The "North Star" Framework

Instead of more process, teams need a clear way to connect their daily work to real business value. This is where the North Star Framework comes in.

Here's how it works:

  1. Define One Key Metric: Choose a single metric that summarizes the value you deliver to customers. For example, Amplitude uses "insights shared and read by at least three people."

  2. Break It Down: Identify the key drivers that teams can actually impact. These become your focus areas.

  3. Create a Rhythm:

  4. Weekly: Review input metrics
  5. Quarterly: Check relationships between inputs and your North Star
  6. Yearly: Validate that your North Star predicts revenue

  7. Make It Visible: Run weekly business reviews where leadership shares these metrics with everyone. Start manual before building dashboards - trustworthy data matters more than automation.

This framework does something powerful: it helps every team member understand how their work drives real value.

The Weekly Business Review

One of the most powerful tools in this framework is the weekly business review. But this isn't your typical metrics meeting.

Here's how to make it work: - Make it a leadership-level meeting that ICs can attend - Focus on building business intuition, not just sharing numbers - Take notes on anomalies and patterns - Share readouts with the entire team - Use it to develop a shared mental model of how the business works

Rethinking Team Structure

Here's another counterintuitive insight: how you organize your teams might be creating unnecessary friction.

Instead of dividing responsibilities by project, try dividing them by metrics. Here's why: - Project-based teams require precise communication boundaries - Metric-based teams can work more fluidly - It reduces communication overhead - Teams naturally align around outcomes instead of outputs

Think about it: When teams own metrics instead of projects, they have the freedom to find the best way to move those metrics.

Early Stage? Even More Important

I know what you're thinking: "This sounds great for big companies, but we're too early for this."

That's what I thought too. But here's what I learned: Being early stage isn't an excuse for throwing spaghetti at the wall.

You can still be systematic, just differently:

  1. Start Qualitative:
  2. Draft clear goals and hypotheses
  3. Generate specific questions to validate them
  4. Talk to customers systematically
  5. Document and learn methodically

  6. Focus on Learning:

  7. Treat tickets as experiments, not features
  8. Make outcomes about learning, not just shipping
  9. Accept that progress is nonlinear
  10. Build systematic ways to capture insights

  11. Build Foundations:

  12. Document your strategy clearly
  13. Make metrics and goals transparent
  14. Share regular updates on progress
  15. Create systems for capturing and sharing learnings

The Experiment Mindset

One crucial shift is thinking about work differently: - The ticket is not the feature - The ticket is the experiment - The outcome is learning

This mindset change helps teams focus on value and learning rather than just shipping features.

Put It Into Practice

Here are five things you can do today to start implementing these ideas:

  1. Define Your North Star: What's the one metric that best captures the value you deliver to customers?

  2. Start Weekly Business Reviews: Schedule a weekly meeting to review key metrics with your entire team. Start simple - even a manual spreadsheet is fine.

  3. Audit Your Process: Look at every process you have. Ask: "Is this helping people make better decisions?" If not, consider dropping it.

  4. Document Your Strategy: Write down how you think the business works. Share it widely and iterate based on feedback.

  5. Shift to Experiments: Start treating work as experiments to test hypotheses rather than features to ship.

The Real Test

The real test of whether this is working isn't in your processes or even your metrics. It's in whether every team member can confidently answer these questions:

  • "What should I be spending my time on today?"
  • "How does my work drive value for our business?"
  • "What am I learning that could change our direction?"

When your team can answer these without hesitation, you've built something special.

Remember: Your team members are smart, capable people. They don't need more process - they need context and clarity to make good decisions.

Give them that, and you'll be amazed at what they can achieve.

P.S. What would you say is your team's biggest obstacle to working this way? Leave a comment below.

SWE vs AI Engineering Standups

When I talk to engineering leaders struggling with their AI teams, I often hear the same frustration: "Why is everything taking so long? Why can't we just ship features like our other teams?"

This frustration stems from a fundamental misunderstanding: AI development isn't just engineering - it's applied research. And this changes everything about how we need to think about progress, goals, and team management. In a previous article I wrote about communication for AI teams. Today I want to talk about standups specifically.

The ticket is not the feature, the ticket is the experiment, the outcome is learning.