This is based on a conversation that came up during office hours from my RAG course for engineering leaders. There's another cohort that's coming up soon, so if you're interested in that, you can sign up here.
When it comes to Retrieval-Augmented Generation (RAG) systems, one of the key challenges is deciding how to select and use tools effectively. As someone who's spent countless hours optimizing these systems, many people ask me whether or not they should think about using retrieval to choose which tools to put into the prompt. What this actually means is that we're interested in making precision and recall trade-offs. I've found that the key lies in balancing recall and precision. Let me break down my approach and share some insights that could help you improve your own RAG implementations.
In this article, we'll cover:
The challenge of tool selection in RAG systems
Understanding the recall vs. precision tradeoff
The "Evergreen Tools" strategy for optimizing tool selection
When it comes to building and improving Retrieval-Augmented Generation (RAG) systems, too many teams focus on the wrong things. They obsess over generation before nailing search, implement RAG without understanding user needs, or get lost in complex improvements without clear metrics. I've seen this pattern repeat across startups of all sizes and industries.
But it doesn't have to be this way. After years of building recommendation systems, instrumenting them, and more recently consulting on RAG applications, I've developed a systematic approach that works. It's not just about what to do, but understanding why each step matters in the broader context of your business.
Here's the flywheel I use to continually infer and improve RAG systems:
In the past year, I've done a lot of consulting on helping companies improve their RAG applications. One of the biggest things I want to call out is the idea of topics and capabilities.
I use this distinction to train teams to identify and look at the data we have to figure out what we need to build next.
In the next 6 to 8 months, RAG will be used primarily for report generation. We'll see a shift from using RAG agents as question-answering systems to using them more as report-generation systems. This is because the value you can get from a report is much greater than the current RAG systems in use. I'll explain this by discussing what I've learned as a consultant about understanding value and then how I think companies should describe the value they deliver through RAG.
This article explains how to make Retrieval-Augmented Generation (RAG) systems better. It's based on a talk I had with Hamel and builds on other articles I've written about RAG.
If you want to learn about how complex RAG systems can be, check out Levels of RAG Complexity. This article breaks down RAG into smaller parts, making it easier to understand. For quick tips on making your RAG system better, read Low Hanging Fruit in RAG.
I also wrote about what I think will happen with RAG in the future in Predictions for the Future of RAG. This article talks about how RAG might be used to create reports in the future.
All these articles work together to give you a full guide on how to make RAG systems better. They offer useful tips for developers and companies who want to improve their systems. If you're interested in AI engineering in general, you might enjoy my talk at the AI Engineer Summit. In this talk, I explain how tools like Pydantic can help with prompt engineering, which is useful for building RAG systems.
Through all these articles, I try to give you a complete view of RAG systems. I cover everything from basic ideas to advanced uses and future predictions. This should help you understand and do well in this fast-changing field.
By the end of this post, you'll understand my step-by-step approach to making RAG applications better for the companies I work with. We'll look at important areas like:
Making fake questions and answers to quickly test how well your system works
Using both full-text search and vector search together for the best results
Setting up the right ways to get feedback from users about what you want to study
Using grouping to find sets of questions that have problems, sorted by topics and abilities
Building specific systems to improve abilities
Constantly checking and testing as you get more real-world data
Through this step-by-step runbook, you'll gain practical knowledge on how to incrementally enhance the performance and utility of your RAG applications, unlocking their full potential to deliver exceptional user experiences and drive business value. Let's dive in and explore how to systematically improve your RAG systems together!
If you're looking to deepen your understanding of RAG systems and learn how to systematically improve them, consider enrolling in the Systematically Improving RAG Applications course. This 4-week program covers everything from evaluation techniques to advanced retrieval methods, helping you build a data flywheel for continuous improvement.
RAG (Retrieval-Augmented Generation), is a powerful technique that combines information retrieval with LLMs to provide relevant and accurate responses to user queries. By searching through a large corpus of text and retrieving the most relevant chunks, RAG systems can generate answers that are grounded in factual information.
In this post, we'll explore six key areas where you can focus your efforts to improve your RAG search system. These include using synthetic data for baseline metrics, adding date filters, improving user feedback copy, tracking average cosine distance and Cohere reranking score, incorporating full-text search, and efficiently generating synthetic data for testing.
This guide explores different levels of complexity in Retrieval-Augmented Generation (RAG) applications. We'll cover everything from basic ideas to advanced methods, making it useful for beginners and experienced developers alike.
We'll start with the basics, like breaking text into chunks, creating embeddings, and storing data. Then, we'll move on to more complex topics such as improved search methods, creating structured responses, and making systems work better. By the end, you'll know how to build strong RAG systems that can answer tricky questions accurately.
As we explore these topics, we'll use ideas from other resources, like our articles on data flywheels and improving tool retrieval in RAG systems. These ideas will help you understand how to create systems that keep improving themselves, making your product better and keeping users more engaged.
Key topics we'll explore include:
Basic text processing and embedding techniques
Efficient data storage and retrieval methods
Advanced search and ranking algorithms
Asynchronous programming for improved performance
Observability and logging for system monitoring
Evaluation strategies using synthetic and real-world data
Query enhancement and summarization techniques
This guide aligns with the insights from our RAG flywheel article, which emphasizes the importance of continuous improvement in RAG systems through data-driven iterations and user feedback integration.
I work with a few seed series a startups that are ramping out their retrieval augmented generation systems. I've noticed a lot of unclear thinking around what metrics to use and when to use them. I've seen a lot of people use "LGTM@Few" as a metric, and I think it's a terrible idea. I'm going to explain why and what you should use instead.
If you want to learn about my consulting practice check out my services page. If you're interested in working together please reach out to me via email
When giving advice to developers on improving their retrieval augmented generation, I usually say two things:
Look at the Data
Don't just look at the Data
Wise men speak in paradoxes because we are afraid of half-truths. This blog post will try to capture when to look at data and when to stop looking at data in the context of retrieval augmented generation.
I'll cover the different relevancy and ranking metrics, some stories to help you understand them, their trade-offs, and some general advice on how to think.
If you've followed my work on RAG systems, you'll know I emphasize treating them as recommendation systems at their core. In this post, we'll explore the concept of inverted thinking to tackle the challenge of building an exceptional RAG system.
What is inverted thinking?
Inverted thinking is a problem-solving approach that flips the perspective. Instead of asking, "How can I build a great RAG system?", we ask, "How could I create the worst possible RAG system?" By identifying potential pitfalls, we can more effectively avoid them and build towards excellence.
With the advent of large language models (LLM), retrieval augmented generation (RAG) has become a hot topic. However throught the past year of helping startups integrate LLMs into their stack I've noticed that the pattern of taking user queries, embedding them, and directly searching a vector store is effectively demoware.
What is RAG?
Retrieval augmented generation (RAG) is a technique that uses an LLM to generate responses, but uses a search backend to augment the generation. In the past year using text embeddings with a vector databases has been the most popular approach I've seen being socialized.
So let's kick things off by examining what I like to call the 'Dumb' RAG Model—a basic setup that's more common than you'd think.