Stop using LGTM@Few as a metric (Better RAG)
I work with a few seed series a startups that are ramping out their retrieval augmented generation systems. I've noticed a lot of unclear thinking around what metrics to use and when to use them. I've seen a lot of people use "LGTM@Few" as a metric, and I think it's a terrible idea. I'm going to explain why and what you should use instead.
For more writing on RAG and evaluation, start with RAG series index.
When giving advice to developers on improving their retrieval augmented generation, I usually say two things:
- Look at the Data
- Don't just look at the Data
Wise men speak in paradoxes because we are afraid of half-truths. This blog post will try to capture when to look at data and when to stop looking at data in the context of retrieval augmented generation.
I'll cover the different relevancy and ranking metrics, some stories to help you understand them, their trade-offs, and some general advice on how to think.
