Skip to content

Writing and mumblings

RAG Course

If you're looking to deepen your understanding of RAG systems and learn how to systematically improve them, consider enrolling in the Systematically Improving RAG Applications course. This 4-week program covers everything from evaluation techniques to advanced retrieval methods, helping you build a data flywheel for continuous improvement.

RAG (Retrieval-Augmented Generation), is a powerful technique that combines information retrieval with LLMs to provide relevant and accurate responses to user queries. By searching through a large corpus of text and retrieving the most relevant chunks, RAG systems can generate answers that are grounded in factual information.

In this post, we'll explore six key areas where you can focus your efforts to improve your RAG search system. These include using synthetic data for baseline metrics, adding date filters, improving user feedback copy, tracking average cosine distance and Cohere reranking score, incorporating full-text search, and efficiently generating synthetic data for testing.

Losing My Hands

The world was ending, and I couldn't even put my pants on. My hands had cramped up so badly that I couldn't grip a water bottle or type and could barely dress myself. A few weeks earlier, I had been riding the greatest decade-high anyone could have dreamed of. I was moving to New York, making 500k, working for an amazing company, and was engaged in what might be the most lucrative field on the planet. I was doing what I loved, getting paid well, and feeling like I was making a difference. Life was good. Well, as good as it could get during a once-in-a-lifetime pandemic. My name is Jason. I'm a machine learning engineer. And this is how I almost lost my hands.

Picking Metrics and Setting Goals

I think people suck at picking metrics and setting goals. Why? Because they tend to pick metrics they can't actually impact and set goals that leave them feeling empty once they've achieved them. So, let's define some key terms and explore how we can do better.

Based on this youtube video

Check out this video to get the audio source that generated this post.

Hiring MLEs at early stage companies

Build fast, hire slow! I hate seeing companies make dumb mistakes, especially regarding hiring, and I’m not against full-time employment. Still, as a consultant, part-time engagements are often more beneficial to me, influencing my perspective on hiring. That said, I've observed two notable patterns in startup hiring practices: hiring too early and not hiring for dedicated research. Unfortunately, these patterns lead to startups hiring machine learning engineers to bolster their generative AI strengths, only to have them perform janitorial work for the first six months of joining. It makes me wonder if startups are making easy-to-correct mistakes based on a sense of insecurity in trying to capture this current wave of AI optimism. Companies hire Machine learning engineers too early in their life cycle.¶

Many startups must stop hiring machine learning engineers too early in the development process, especially when the primary focus should have been on app development and integration work. A full-stack AI engineer can provide much greater value at this stage since they're likely to function as a full-stack developer rather than a specialized machine learning engineer. Consequently, these misplaced machine learning engineers often assist with app development or DevOps tasks instead of focusing on their core competencies of training models and building ML solutions.

After all, my background is in mathematics and physics, not engineering. I would rather spend my days looking at data than trying to spend two or three hours debugging TypeScript build errors.

Data Flywheel Go Brrr: Using Your Users to Build Better Products

You need to be taking advantage of your users wherever possible. It’s become a bit of a cliche that customers are your most important stakeholders. In the past, this meant that customers bought the product that the company sold and thus kept it solvent. However, as AI seemingly conquers everything, businesses must find replicable processes to create products that meet their users’ needs and are flexible enough to be continually improved and updated over time. This means your users are your most important asset in improving your product. Take advantage of that and use your users to build a better product!

Unraveling the History of Technological Skepticism

Technological advancements have always been met with a mix of skepticism and fear. From the telephone disrupting face-to-face communication to calculators diminishing mental arithmetic skills, each new technology has faced resistance. Even the written word was once believed to weaken human memory.

Technology Perceived Threat
Telephone Disrupting face-to-face communication
Calculators Diminishing mental arithmetic skills
Typewriter Degrading writing quality
Printing Press Threatening manual script work
Written Word Weakening human memory

Levels of Complexity: RAG Applications

This guide explores different levels of complexity in Retrieval-Augmented Generation (RAG) applications. We'll cover everything from basic ideas to advanced methods, making it useful for beginners and experienced developers alike.

We'll start with the basics, like breaking text into chunks, creating embeddings, and storing data. Then, we'll move on to more complex topics such as improved search methods, creating structured responses, and making systems work better. By the end, you'll know how to build strong RAG systems that can answer tricky questions accurately.

As we explore these topics, we'll use ideas from other resources, like our articles on data flywheels and improving tool retrieval in RAG systems. These ideas will help you understand how to create systems that keep improving themselves, making your product better and keeping users more engaged.

Key topics we'll explore include:

  1. Basic text processing and embedding techniques
  2. Efficient data storage and retrieval methods
  3. Advanced search and ranking algorithms
  4. Asynchronous programming for improved performance
  5. Observability and logging for system monitoring
  6. Evaluation strategies using synthetic and real-world data
  7. Query enhancement and summarization techniques

This guide aligns with the insights from our RAG flywheel article, which emphasizes the importance of continuous improvement in RAG systems through data-driven iterations and user feedback integration.

Format your own prompts

This is mostly to add onto Hamels great post called Fuck you show me the prompt

I think too many llm libraries are trying to format your strings in weird ways that don't make sense. In an OpenAI call for the most part what they accept is an array of messages.

from pydantic import BaseModel

class Messages(BaseModel):
    content: str
    role: Literal["user", "system", "assistant"]

But so many libaries wanted me you to submit a string block and offer some synatic sugar to make it look like this: They also tend to map the docstring to the prompt. so instead of accessing a string variable I have to access the docstring via __doc__.

A feat of strength MVP for AI Apps

A minimum viable product (MVP) is a version of a product with just enough features to be usable by early customers, who can then provide feedback for future product development.

Today I want to focus on what that looks like for shipping AI applications. To do that, we only need to understand 4 things.

  1. What does 80% actually mean?

  2. What segments can we serve well?

  3. Can we double down?

  4. Can we educate the user about the segments we don’t serve well?

The Pareto principle, also known as the 80/20 rule, still applies but in a different way than you might think.