Skip to content

2025

How to Systematically Improve RAG Applications

Retrieval-Augmented Generation (RAG) is a simple, powerful idea: attach a large language model (LLM) to external data, and harness better, domain-specific outputs. Yet behind that simplicity lurks a maze of hidden pitfalls: no metrics, no data instrumentation, not even clarity about what exactly we’re trying to improve.

In this mega-long post, I’ll lay out everything I know about systematically improving RAG apps—from fundamental retrieval metrics, to segmentation and classification, to structured extraction, multimodality, fine-tuned embeddings, query routing, and closing the loop with real user feedback. It’s the end-to-end blueprint for building and iterating a RAG system that actually works in production.

I’ve spent years consulting on applied AI—spanning recommendation systems, spam detection, generative search, and RAG. That includes building ML pipelines for large-scale recommendation frameworks, doing vision-based detection, curation of specialized datasets, and more. In short, I’ve seen many “AI fails” up close. Over time, I’ve realized that gluing an LLM to your data is just the first step. The real magic is how you measure, iterate, and keep your system from sliding backward.

We’ll break everything down in a systematic, user-centric way. If you’re tired of random prompt hacks and single-number “accuracy” illusions, you’re in the right place.

10 “Foot Guns" for Fine-Tuning and Few-Shots

Let me share a story that might sound familiar.

A few months back, I was helping a Series A startup with their LLM deployment. Their CTO pulled me aside and said, "Jason, we're burning through our OpenAI credits like crazy, and our responses are still inconsistent. We thought fine-tuning would solve everything, but now we're knee-deep in training data issues."

Fast forward to today, and I’ve been diving deep into these challenges as an advisor to Zenbase, a production level version of DSPY. We’re on a mission to help companies get the most out of their AI investments. Think of them as your AI optimization guides, they've been through the trenches, made the mistakes, and now we’re here to help you avoid them.

In this post, I’ll walk you through some of the biggest pitfalls. I’ll share real stories, practical solutions, and lessons learned from working with dozens of companies.

Making Money is Negative Margin

In 2020 I had a hand injury that ended my career for 2-3 years. I've only managed to bounce back into being an indie consultant and educator. On the way back to being a productive member of society I've learned a few things:

  1. I have what it takes to be successful, whether that's the feeling of never wanting to be poor again, or some internal motivation, or the 'cares a lot' or the 'chip on the shoulder' - whatever it is, I believe I will be successful
  2. The gift of being enough is the greatest gift I can give myself
  3. I will likely make too many sacrifices by default, not too few, and it will reflect in my regrets later in life