RAG¶

2025/09/11
in Speaker Series, RAG
11 min read

TurboPuffer: Object Storage-First Vector Database Architecture

I hosted a session with Simon, CEO of TurboPuffer, to explore how vector search works at scale for RAG applications. We discussed the economics and architecture of object storage-based vector databases, performance considerations, and real-world implementations from companies like Notion, Linear, and Cursor.

2025/09/11
in Speaker Series, RAG
7 min read

How Zapier 4x'd Their AI Feedback Collection (Vitor)

I hosted a session with Vitor from Zapier to discuss how they dramatically improved their feedback collection systems for AI products. This conversation reveals practical strategies for gathering, analyzing, and implementing user feedback to create a continuous improvement cycle for RAG systems and AI applications.

2025/03/06
in RAG
9 min read

Authority in RAG Systems: The Missing Piece in Your Retrieval Strategy

Here's the thing about RAG (Retrieval-Augmented Generation): everyone's obsessed with fancy embeddings and vector search, but they're missing something crucial – authority matters just as much as relevance.

My students constantly ask about a classic problem: "What happens when new documents supersede old ones?" A technical guide from 2023 might be completely outdated by a 2025 update, but pure semantic search doesn't know that. It might retrieve the old version simply because the embedding is marginally closer to the query.

This highlights a bigger truth: relevancy, freshness, and authority are all critical signals that traditional information retrieval systems juggled effectively. Somehow we've forgotten these lessons in our rush to build RAG systems. The newest and shiniest AI technique isn't always the complete solution.

I've spent years working with ML systems, and I've seen this pattern repeatedly. We get excited about semantic search, but forget the hard-won lessons from decades of information retrieval: not all sources deserve equal trust.

2024/11/07
in RAG
5 min read

What is Retrieval Augmented Generation?

Retrieval augmented generation (RAG) is a technique that enhances the capabilities of large language models (LLMs) by integrating them with external knowledge sources. In essence, RAG combines the generative power of LLMs with the vast information stored in databases, documents, and other repositories. This approach enables LLMs to generate more accurate, relevant, and contextually grounded responses.

2024/11/04
in RAG
3 min read

How to Improve RAG Applications; 6 Proven Strategies

This article explains six proven strategies to improve Retrieval-Augmented Generation (RAG) systems. It builds on my previous articles and consulting experience helping companies enhance their RAG applications.

By the end of this post, you'll understand six key strategies I've found effective when improving RAG applications:

Building a data flywheel with synthetic testing
Implementing structured query segmentation
Developing specialized search indices
Mastering query routing and tool selection
Leveraging metadata effectively
Creating robust feedback loops

2024/10/26
in RAG
11 min read

FAQ on Improving RAG Applications

This FAQ is generated by NotebookLM and Gemini and addresses common questions from the "Systematically Improving RAG Applications" course. The course is a comprehensive, six-week program that guides you through:

RAG fundamentals
Advanced implementation strategies
Synthetic data generation techniques
Query routing optimization
Embedding fine-tuning methods

2024/08/21
in RAG
4 min read

Optimizing Tool Retrieval in RAG Systems: A Balanced Approach

RAG Course

This is based on a conversation that came up during office hours from my RAG course for engineering leaders. There's another cohort that's coming up soon, so if you're interested in that, you can sign up here.

When it comes to Retrieval-Augmented Generation (RAG) systems, one of the key challenges is deciding how to select and use tools effectively. As someone who's spent countless hours optimizing these systems, many people ask me whether or not they should think about using retrieval to choose which tools to put into the prompt. What this actually means is that we're interested in making precision and recall trade-offs. I've found that the key lies in balancing recall and precision. Let me break down my approach and share some insights that could help you improve your own RAG implementations.

In this article, we'll cover:

The challenge of tool selection in RAG systems
Understanding the recall vs. precision tradeoff
The "Evergreen Tools" strategy for optimizing tool selection

2024/08/19
in RAG
4 min read

The RAG Playbook

When it comes to building and improving Retrieval-Augmented Generation (RAG) systems, too many teams focus on the wrong things. They obsess over generation before nailing search, implement RAG without understanding user needs, or get lost in complex improvements without clear metrics. I've seen this pattern repeat across startups of all sizes and industries.

But it doesn't have to be this way. After years of building recommendation systems, instrumenting them, and more recently consulting on RAG applications, I've developed a systematic approach that works. It's not just about what to do, but understanding why each step matters in the broader context of your business.

Here's the flywheel I use to continually infer and improve RAG systems:

Initial Implementation
Synthetic Data Generation
Fast Evaluations
Real-World Data Collection
Classification and Analysis
System Improvements
Production Monitoring
User Feedback Integration
Iteration

Let's break this down step-by-step:

2024/06/05
in RAG
5 min read

Predictions for the Future of RAG

In the next 6 to 8 months, RAG will be used primarily for report generation. We'll see a shift from using RAG agents as question-answering systems to using them more as report-generation systems. This is because the value you can get from a report is much greater than the current RAG systems in use. I'll explain this by discussing what I've learned as a consultant about understanding value and then how I think companies should describe the value they deliver through RAG.

Rag is the feature, not the benefit.

2024/05/22
in RAG
11 min read

Systematically Improving Your RAG

This article explains how to make Retrieval-Augmented Generation (RAG) systems better. It's based on a talk I had with Hamel and builds on other articles I've written about RAG. For a comprehensive understanding of RAG fundamentals, see my guide on what RAG is.

In RAG is More Than Just Embeddings, I talk about how RAG is more than just vector embeddings. This helps you understand RAG better. I also wrote How to Build a Terrible RAG System, where I show what not to do, which can help you learn good practices.

If you want to learn about how complex RAG systems can be, check out Levels of RAG Complexity. This article breaks down RAG into smaller parts, making it easier to understand. For quick tips on making your RAG system better, read Low Hanging Fruit in RAG.

I also wrote about what I think will happen with RAG in the future in Predictions for the Future of RAG. This article talks about how RAG might be used to create reports in the future.

All these articles work together to give you a full guide on how to make RAG systems better. They offer useful tips for developers and companies who want to improve their systems. For additional improvement strategies, check out my six tips for improving RAG and insights on RAG anti-patterns. If you're interested in AI engineering in general, you might enjoy my talk at the AI Engineer Summit. In this talk, I explain how tools like Pydantic can help with prompt engineering, which is useful for building RAG systems.

Through all these articles, I try to give you a complete view of RAG systems. I cover everything from basic ideas to advanced uses and future predictions. This should help you understand and do well in this fast-changing field.

By the end of this post, you'll understand my step-by-step approach to making RAG applications better for the companies I work with. We'll look at important areas like:

Making fake questions and answers to quickly test how well your system works
Using both full-text search and vector search together for the best results
Setting up the right ways to get feedback from users about what you want to study
Using grouping to find sets of questions that have problems, sorted by topics and abilities
Building specific systems to improve abilities
Constantly checking and testing as you get more real-world data

Through this step-by-step runbook, you'll gain practical knowledge on how to incrementally enhance the performance and utility of your RAG applications, unlocking their full potential to deliver exceptional user experiences and drive business value. Let's dive in and explore how to systematically improve your RAG systems together!