Skip to content

RAG

How to build a terrible RAG system

If you've followed my work on RAG systems, you'll know I emphasize treating them as recommendation systems at their core. In this post, we'll explore the concept of inverted thinking to tackle the challenge of building an exceptional RAG system.

What is inverted thinking?

Inverted thinking is a problem-solving approach that flips the perspective. Instead of asking, "How can I build a great RAG system?", we ask, "How could I create the worst possible RAG system?" By identifying potential pitfalls, we can more effectively avoid them and build towards excellence.

This approach aligns with our broader discussion on RAG systems, which you can explore further in our RAG flywheel article and our comprehensive guide on Levels of Complexity in RAG Applications.

With the advent of large language models (LLM), retrieval augmented generation (RAG) has become a hot topic. However throught the past year of helping startups integrate LLMs into their stack I've noticed that the pattern of taking user queries, embedding them, and directly searching a vector store is effectively demoware.

What is RAG?

Retrieval augmented generation (RAG) is a technique that uses an LLM to generate responses, but uses a search backend to augment the generation. In the past year using text embeddings with a vector databases has been the most popular approach I've seen being socialized.

RAG

Simple RAG that embedded the user query and makes a search.

So let's kick things off by examining what I like to call the 'Dumb' RAG Model—a basic setup that's more common than you'd think.