Prompt optimization is the process of improving the quality of prompts used to generate content. Often by using few shots of context to generate a few examples of the desired output, then refining the prompt to generate more examples of the desired output.
This article explains how to make Retrieval-Augmented Generation (RAG) systems better. It's based on a talk I had with Hamel and builds on other articles I've written about RAG. For a comprehensive understanding of RAG fundamentals, see my guide on what RAG is.
If you want to learn about how complex RAG systems can be, check out Levels of RAG Complexity. This article breaks down RAG into smaller parts, making it easier to understand. For quick tips on making your RAG system better, read Low Hanging Fruit in RAG.
I also wrote about what I think will happen with RAG in the future in Predictions for the Future of RAG. This article talks about how RAG might be used to create reports in the future.
All these articles work together to give you a full guide on how to make RAG systems better. They offer useful tips for developers and companies who want to improve their systems. For additional improvement strategies, check out my six tips for improving RAG and insights on RAG anti-patterns. If you're interested in AI engineering in general, you might enjoy my talk at the AI Engineer Summit. In this talk, I explain how tools like Pydantic can help with prompt engineering, which is useful for building RAG systems.
Through all these articles, I try to give you a complete view of RAG systems. I cover everything from basic ideas to advanced uses and future predictions. This should help you understand and do well in this fast-changing field.
By the end of this post, you'll understand my step-by-step approach to making RAG applications better for the companies I work with. We'll look at important areas like:
Making fake questions and answers to quickly test how well your system works
Using both full-text search and vector search together for the best results
Setting up the right ways to get feedback from users about what you want to study
Using grouping to find sets of questions that have problems, sorted by topics and abilities
Building specific systems to improve abilities
Constantly checking and testing as you get more real-world data
This step-by-step runbook shows how to incrementally improve the performance and utility of your RAG applications. Let's dive in and explore how to systematically improve your RAG systems.
If you're looking to go deeper on RAG, start with the RAG series index.
RAG (Retrieval-Augmented Generation), is a powerful technique that combines information retrieval with LLMs to provide relevant and accurate responses to user queries. By searching through a large corpus of text and retrieving the most relevant chunks, RAG systems can generate answers that are grounded in factual information.
In this post, we'll explore six key areas where you can focus your efforts to improve your RAG search system. These include using synthetic data for baseline metrics, adding date filters, improving user feedback copy, tracking average cosine distance and Cohere reranking score, incorporating full-text search, and efficiently generating synthetic data for testing.
The world was ending, and I couldn't even put my pants on. My hands had cramped up so badly that I couldn't grip a water bottle or type and could barely dress myself. A few weeks earlier, I had been riding the greatest decade-high anyone could have dreamed of. I was moving to New York, making 500k, working for an amazing company, and was engaged in what might be the most lucrative field on the planet. I was doing what I loved, getting paid well, and feeling like I was making a difference. Life was good. Well, as good as it could get during a once-in-a-lifetime pandemic. My name is Jason. I'm a machine learning engineer. And this is how I almost lost my hands.
I think people suck at picking metrics and setting goals. Why? Because they tend to pick metrics they can't actually impact and set goals that leave them feeling empty once they've achieved them. So, let's define some key terms and explore how we can do better.
Based on this youtube video
Check out this video to get the audio source that generated this post.
Build fast, hire slow! I hate seeing companies make dumb mistakes, especially regarding hiring, and I’m not against full-time employment. Still, as a consultant, part-time engagements are often more beneficial to me, influencing my perspective on hiring. That said, I've observed two notable patterns in startup hiring practices: hiring too early and not hiring for dedicated research. Unfortunately, these patterns lead to startups hiring machine learning engineers to bolster their generative AI strengths, only to have them perform janitorial work for the first six months of joining. It makes me wonder if startups are making easy-to-correct mistakes based on a sense of insecurity in trying to capture this current wave of AI optimism. Companies hire Machine learning engineers too early in their life cycle.¶
Many startups must stop hiring machine learning engineers too early in the development process, especially when the primary focus should have been on app development and integration work. A full-stack AI engineer can provide much greater value at this stage since they're likely to function as a full-stack developer rather than a specialized machine learning engineer. Consequently, these misplaced machine learning engineers often assist with app development or DevOps tasks instead of focusing on their core competencies of training models and building ML solutions.
After all, my background is in mathematics and physics, not engineering. I would rather spend my days looking at data than trying to spend two or three hours debugging TypeScript build errors.
You need to be taking advantage of your users wherever possible. It’s become a bit of a cliche that customers are your most important stakeholders. In the past, this meant that customers bought the product that the company sold and thus kept it solvent. However, as AI seemingly conquers everything, businesses must find replicable processes to create products that meet their users’ needs and are flexible enough to be continually improved and updated over time. This means your users are your most important asset in improving your product. Take advantage of that and use your users to build a better product!
Technological advancements have always been met with a mix of skepticism and fear. From the telephone disrupting face-to-face communication to calculators diminishing mental arithmetic skills, each new technology has faced resistance. Even the written word was once believed to weaken human memory.
This guide explores different levels of complexity in Retrieval-Augmented Generation (RAG) applications. We'll cover everything from basic ideas to advanced methods, making it useful for beginners and experienced developers alike.
We'll start with the basics, like breaking text into chunks, creating embeddings, and storing data. Then, we'll move on to more complex topics such as improved search methods, creating structured responses, and making systems work better. By the end, you'll know how to build strong RAG systems that can answer tricky questions accurately.
As we explore these topics, we'll use ideas from other resources, like our articles on data flywheels and improving tool retrieval in RAG systems. These ideas will help you understand how to create systems that keep improving themselves, making your product better and keeping users more engaged.
Key topics we'll explore include:
Basic text processing and embedding techniques
Efficient data storage and retrieval methods
Advanced search and ranking algorithms
Asynchronous programming for improved performance
Observability and logging for system monitoring
Evaluation strategies using synthetic and real-world data
Query enhancement and summarization techniques
This guide aligns with the insights from our RAG flywheel article, which emphasizes the importance of continuous improvement in RAG systems through data-driven iterations and user feedback integration.
I think too many llm libraries are trying to format your strings in weird ways that don't make sense. In an OpenAI call for the most part what they accept is an array of messages.
But so many libaries wanted me you to submit a string block and offer some synatic sugar to make it look like this: They also tend to map the docstring to the prompt. so instead of accessing a string variable I have to access the docstring via __doc__.