Skip to content

RAG

RAG Course

If you're looking to deepen your understanding of RAG systems and learn how to systematically improve them, consider enrolling in the Systematically Improving RAG Applications course. This 4-week program covers everything from evaluation techniques to advanced retrieval methods, helping you build a data flywheel for continuous improvement.

RAG (Retrieval-Augmented Generation), is a powerful technique that combines information retrieval with LLMs to provide relevant and accurate responses to user queries. By searching through a large corpus of text and retrieving the most relevant chunks, RAG systems can generate answers that are grounded in factual information.

In this post, we'll explore six key areas where you can focus your efforts to improve your RAG search system. These include using synthetic data for baseline metrics, adding date filters, improving user feedback copy, tracking average cosine distance and Cohere reranking score, incorporating full-text search, and efficiently generating synthetic data for testing.

Levels of Complexity: RAG Applications

This guide explores different levels of complexity in Retrieval-Augmented Generation (RAG) applications. We'll cover everything from basic ideas to advanced methods, making it useful for beginners and experienced developers alike.

We'll start with the basics, like breaking text into chunks, creating embeddings, and storing data. Then, we'll move on to more complex topics such as improved search methods, creating structured responses, and making systems work better. By the end, you'll know how to build strong RAG systems that can answer tricky questions accurately.

As we explore these topics, we'll use ideas from other resources, like our articles on data flywheels and improving tool retrieval in RAG systems. These ideas will help you understand how to create systems that keep improving themselves, making your product better and keeping users more engaged.

Key topics we'll explore include:

  1. Basic text processing and embedding techniques
  2. Efficient data storage and retrieval methods
  3. Advanced search and ranking algorithms
  4. Asynchronous programming for improved performance
  5. Observability and logging for system monitoring
  6. Evaluation strategies using synthetic and real-world data
  7. Query enhancement and summarization techniques

This guide aligns with the insights from our RAG flywheel article, which emphasizes the importance of continuous improvement in RAG systems through data-driven iterations and user feedback integration.

Stop using LGTM@Few as a metric (Better RAG)

I work with a few seed series a startups that are ramping out their retrieval augmented generation systems. I've noticed a lot of unclear thinking around what metrics to use and when to use them. I've seen a lot of people use "LGTM@Few" as a metric, and I think it's a terrible idea. I'm going to explain why and what you should use instead.

If you want to learn about my consulting practice check out my services page. If you're interested in working together please reach out to me via email


When giving advice to developers on improving their retrieval augmented generation, I usually say two things:

  1. Look at the Data
  2. Don't just look at the Data

Wise men speak in paradoxes because we are afraid of half-truths. This blog post will try to capture when to look at data and when to stop looking at data in the context of retrieval augmented generation.

I'll cover the different relevancy and ranking metrics, some stories to help you understand them, their trade-offs, and some general advice on how to think.

How to build a terrible RAG system

If you've followed my work on RAG systems, you'll know I emphasize treating them as recommendation systems at their core. In this post, we'll explore the concept of inverted thinking to tackle the challenge of building an exceptional RAG system.

What is inverted thinking?

Inverted thinking is a problem-solving approach that flips the perspective. Instead of asking, "How can I build a great RAG system?", we ask, "How could I create the worst possible RAG system?" By identifying potential pitfalls, we can more effectively avoid them and build towards excellence.

This approach aligns with our broader discussion on RAG systems, which you can explore further in our RAG flywheel article and our comprehensive guide on Levels of Complexity in RAG Applications.

With the advent of large language models (LLM), retrieval augmented generation (RAG) has become a hot topic. However throught the past year of helping startups integrate LLMs into their stack I've noticed that the pattern of taking user queries, embedding them, and directly searching a vector store is effectively demoware.

What is RAG?

Retrieval augmented generation (RAG) is a technique that uses an LLM to generate responses, but uses a search backend to augment the generation. In the past year using text embeddings with a vector databases has been the most popular approach I've seen being socialized.

RAG

Simple RAG that embedded the user query and makes a search.

So let's kick things off by examining what I like to call the 'Dumb' RAG Model—a basic setup that's more common than you'd think.