Writing and mumblings¶

2025/03/18
7 min read

Version Control for the Vibe Coder (Part 2)

In Part 1, you learned the basics of safely using Git with Cursor agents. Now, let's level up your workflow by diving into advanced Git practices and explicitly instructing Cursor to handle these for you.

2025/03/06
14 min read

Fine-Tuning Embedding Models for Enterprise RAG: Lessons from Glean

Systematically improving RAG systems

This transcript is based off of a guest lecture given in my course, Systematically Improving RAG Applications

Retrieval-Augmented Generation (RAG) systems have become essential tools for enterprises looking to harness their vast repositories of internal knowledge. While the theoretical foundations of RAG are well-understood, implementing these systems effectively in enterprise environments presents unique challenges that aren't addressed in academic literature or consumer applications. This article delves into advanced techniques for fine-tuning embedding models in enterprise RAG systems, based on insights from Manav Rathod, a software engineer at Glean who specializes in semantic search and ML systems for search ranking and assistant quality.

The discussion focuses on a critical yet often overlooked component of RAG systems: custom-trained embedding models that understand company-specific language, terminology, and document relationships. As Jason Liu aptly noted during the session, "If you're not fine-tuning your embeddings, you're more like a Blockbuster than a Netflix." This perspective highlights how critical embedding fine-tuning has become for competitive enterprise AI systems.

2025/03/05
in AI
6 min read

Hard Truths From the AI Trenches

I never planned to become a consultant. But somewhere between building machine learning systems and growing my Twitter following, companies started sliding into my DMs with the same message: "Help, our AI isn't working."

So I started charging to join their stand-ups. Sometimes I didn't even code. I just asked uncomfortable questions.

Here's what I've learned watching companies burn millions on AI.

2025/01/24
11 min read

How to Systematically Improve RAG Applications

Retrieval-Augmented Generation (RAG) is a simple, powerful idea: attach a large language model (LLM) to external data, and harness better, domain-specific outputs. Yet behind that simplicity lurks a maze of hidden pitfalls: no metrics, no data instrumentation, not even clarity about what exactly we’re trying to improve.

In this mega-long post, I’ll lay out everything I know about systematically improving RAG apps—from fundamental retrieval metrics, to segmentation and classification, to structured extraction, multimodality, fine-tuned embeddings, query routing, and closing the loop with real user feedback. It’s the end-to-end blueprint for building and iterating a RAG system that actually works in production.

I’ve spent years consulting on applied AI—spanning recommendation systems, spam detection, generative search, and RAG. That includes building ML pipelines for large-scale recommendation frameworks, doing vision-based detection, curation of specialized datasets, and more. In short, I’ve seen many “AI fails” up close. Over time, I’ve realized that gluing an LLM to your data is just the first step. The real magic is how you measure, iterate, and keep your system from sliding backward.

We’ll break everything down in a systematic, user-centric way. If you’re tired of random prompt hacks and single-number “accuracy” illusions, you’re in the right place.

2025/01/22
in Prompting
7 min read

10 “Foot Guns" for Fine-Tuning and Few-Shots

Let me share a story that might sound familiar.

A few months back, I was helping a Series A startup with their LLM deployment. Their CTO pulled me aside and said, "Jason, we're burning through our OpenAI credits like crazy, and our responses are still inconsistent. We thought fine-tuning would solve everything, but now we're knee-deep in training data issues."

Fast forward to today, and I’ve been diving deep into these challenges as an advisor to Zenbase, a production level version of DSPY. We’re on a mission to help companies get the most out of their AI investments. Think of them as your AI optimization guides, they've been through the trenches, made the mistakes, and now we’re here to help you avoid them.

In this post, I’ll walk you through some of the biggest pitfalls. I’ll share real stories, practical solutions, and lessons learned from working with dozens of companies.

2025/01/07
4 min read

Making Money is Negative Margin

In 2020 I had a hand injury that ended my career for 2-3 years. I've only managed to bounce back into being an indie consultant and educator. On the way back to being a productive member of society I've learned a few things:

I have what it takes to be successful, whether that's the feeling of never wanting to be poor again, or some internal motivation, or the 'cares a lot' or the 'chip on the shoulder' - whatever it is, I believe I will be successful
The gift of being enough is the greatest gift I can give myself
I will likely make too many sacrifices by default, not too few, and it will reflect in my regrets later in life

2025/01/06
in Writing and Communication, Software Engineering
4 min read

I Used AI Agents to Add 50+ Cross-Links to My Blog (And You Can Too)

I just had an AI agent read through 100+ of my blog posts and tell me exactly where to add internal links. In 30 minutes, it found connections I'd missed for years. Here's how I did it.

2025/01/01
in AI, Engineering, MCP
2 min read

Prompt Template Resource System Specification

Overview

A token-efficient system for referencing external resources in LLM prompts without including their full content, designed to optimize token usage when LLMs generate template calls.

2024/12/27
4 min read

No One Has Potential But Yourself

I had a conversation with my friend today that shook something loose in my head: no one has potential. Like most of the lies I tell myself, this is obviously false - and yet, sometimes we need these extreme statements to see a deeper truth.

We often combat excess pessimism with excess optimism. We see potential in others and believe they can change. But this is just a projection of our own potential and values and beliefs.

Let me explain.

2024/12/19
18 min read

Legal Office Hours for AI Consultants

I want to invite my lawyer, Luke, to talk a little bit about the legal side of consulting. If you're new you should also checkout our consulting stack post.

In August, Luke officially launched Virgil. Their goal at Virgil is to be a one-stop shop for a startup’s back office, combining legal with related services that founders often prefer to outsource, such as bookkeeping, compliance, tax, and people operations. We primarily operate on flat monthly subscriptions, allowing startups to focus on what truly moves the needle.

He launched Virgil with Eric Ries, author of The Lean Startup, and Jeremy Howard, CEO of Answer AI. He's able to rely on the Answer AI team to build tools and help him stay informed about AI. He's licensed to practice in Illinois, and they have a national presence. That's his background and the essence of what we're building at Virgil.