Prompt Template Resource System Specification
Overview
A token-efficient system for referencing external resources in LLM prompts without including their full content, designed to optimize token usage when LLMs generate template calls.
A token-efficient system for referencing external resources in LLM prompts without including their full content, designed to optimize token usage when LLMs generate template calls.
I had a conversation with my friend today that shook something loose in my head: no one has potential. Like most of the lies I tell myself, this is obviously false - and yet, sometimes we need these extreme statements to see a deeper truth.
We often combat excess pessimism with excess optimism. We see potential in others and believe they can change. But this is just a projection of our own potential and values and beliefs.
Let me explain.
I want to invite my lawyer, Luke, to talk a little bit about the legal side of consulting. If you're new you should also checkout our consulting stack post.
In August, Luke officially launched Virgil. Their goal at Virgil is to be a one-stop shop for a startup’s back office, combining legal with related services that founders often prefer to outsource, such as bookkeeping, compliance, tax, and people operations. We primarily operate on flat monthly subscriptions, allowing startups to focus on what truly moves the needle.
He launched Virgil with Eric Ries, author of The Lean Startup, and Jeremy Howard, CEO of Answer AI. He's able to rely on the Answer AI team to build tools and help him stay informed about AI. He's licensed to practice in Illinois, and they have a national presence. That's his background and the essence of what we're building at Virgil.
This section contains talks and presentations from the Systematically Improving RAG Applications series, featuring insights from industry experts and practitioners. Each talk provides specific learning outcomes, actionable techniques, and often surprising insights that challenge conventional RAG wisdom.
📚 Get the Complete Course - 20% Off
This content is from the Systematically Improving RAG Applications course on Maven.
Readers can enroll for 20% off with code: EBOOK
Join 500+ engineers who've transformed their RAG systems from demos to production-ready applications.
Establishing evaluation frameworks and building feedback systems.
Building Feedback Systems for AI Products - Vitor (Zapier)
Simple UX changes increased feedback collection from 10 to 40+ submissions per day (4x improvement). Game-changing insight: specific feedback questions like "Did this run do what you expected?" dramatically outperform generic "How did we do?" prompts. The team discovered they were missing positive feedback entirely due to poor collection mechanisms.
Text Chunking Strategies - Anton (ChromaDB)
Why chunking remains critical even with infinite context windows due to embedding model limitations and retrieval performance. Surprising discovery: default chunking strategies in popular libraries often produce terrible results for specific datasets. Essential practice: always manually examine your chunks.
Understanding Embedding Performance through Generative Evals - Kelly Hong
Generative benchmarking for creating custom evaluation sets from your own data. Surprising finding: model rankings on custom benchmarks often contradict MTEB rankings, showing that public benchmark performance doesn't guarantee real-world success. Method: filter document chunks for relevance → generate realistic queries with context and examples → evaluate retrieval performance.
Creating custom embedding models and fine-tuning for specific domains.
Enterprise Search and Fine-tuning Embedding Models - Manav (Glean)
Custom embedding models for each customer achieve 20% performance improvements over 6 months through continuous learning. Counter-intuitive insight: smaller, fine-tuned models often outperform larger general-purpose models for company-specific terminology. Each customer gets their own model that learns from user feedback.
Fine-tuning Re-rankers and Embedding Models for Better RAG Performance - Ayush (LanceDB)
Re-rankers provide 12-20% retrieval improvement with minimal latency penalty, making them "low-hanging fruit" for RAG optimization. Even small 6M parameter models show significant improvements. ColBERT architecture offers effective middle ground between bi-encoders and cross-encoders.
Deployment strategies and production monitoring for RAG systems.
Online Evals and Production Monitoring - Ben & Sidhant
Trellis framework for managing AI systems with millions of users. Critical discovery: traditional error monitoring (like Sentry) doesn't work for AI since there's no exception when models produce bad outputs. Their approach: discretize infinite outputs → prioritize by impact → recursively refine. Key insight: "vibe checks" often beat complex automated evaluation.
RAG Anti-patterns in the Wild - Skylar Payne
90% of teams adding complexity to RAG systems see worse performance when properly evaluated. Major discovery: silent failures in document processing can eliminate 20%+ of corpus without detection. Golden rule: teams who iterate fastest on data examination consistently outperform those focused on algorithmic sophistication.
Domain Experts: The Lever for Vertical AI - Chris Lovejoy (Anterior)
How to make LLMs work in specialized industries: build domain‑expert review loops that generate failure‑mode datasets, prioritize fixes by impact, and dynamically augment prompts with expert knowledge. Trust requires transparent production metrics, secure data handling, and defenses against LLM‑specific threats.
Understanding user queries and routing them effectively.
Query Routing for RAG Systems - Anton (ChromaDB)
Why the "big pile of records" approach reduces recall due to approximate nearest neighbor algorithms. When filtering large indexes, compute budget is wasted on irrelevant nodes. Solution: separate indexes per user/data source often outperform filtered large indexes because filtering inherently reduces recall.
Building specialized capabilities for different content types and use cases.
Autonomous Coding Agents w/ Nik Pash @ Cline - Nik Pash (Cline)
Why leading coding agent companies are abandoning embedding-based RAG in favor of direct code exploration. Surprising insight: even massive enterprise codebases work better with agentic exploration than vector search. Key finding: "narrative integrity" - agents need coherent thought processes, not disconnected code snippets from similarity search.
Agentic RAG - Colin Flaherty
Surprising findings from top SWE-Bench performance: simple tools like grep and find outperformed sophisticated embedding models due to agent persistence and course-correction capabilities. Key recommendation: expose existing retrieval systems as tools to agents rather than replacing them.
Better RAG Through Better Data - Adit (Reducto)
Hybrid computer vision + VLM pipelines outperform pure approaches for document parsing. Critical finding: even 1-2 degree document skews can dramatically impact extraction quality. Essential insight: invest heavily in domain-specific evaluation rather than generic benchmarks.
Encoder Stacking and Multi-Modal Retrieval - Daniel (Superlinked)
LLMs as "pilots that see the world as strings" fundamentally can't understand numerical relationships. Solution: mixture of specialized encoders for different data types (text, numerical, location, graph) rather than forcing everything through text embeddings. This approach eliminates over-reliance on re-ranking.
Lexical Search in RAG Applications - John Berryman
Why semantic search struggles with exact matching, product IDs, and specialized terminology. Lexical search provides efficient simultaneous filtering and rich metadata that helps LLMs make better decisions. Recommended approach: use lexical search for filtering, semantic search for understanding meaning.
Cutting-edge approaches and innovative techniques.
RAG is Dead - Long Live Agentic Code Exploration - Nik Pash (Cline)
Why leading coding agent companies are abandoning embedding-based RAG in favor of direct code exploration. Surprising insight: even massive enterprise codebases work better with agentic exploration than vector search. Key finding: "narrative integrity" - agents need coherent thought processes, not disconnected code snippets from similarity search.
Semantic Search Over the Web with Exa - Will Bryk (Exa)
Why AI systems need fundamentally different search engines than humans. Vision for "perfect search" includes test-time compute where complex queries may take hours or days. Prediction: search market will fragment into specialized providers rather than one-size-fits-all solutions.
RAG Without APIs: Browser-Based Retrieval - Michael (OpenBB)
Browser-as-data-layer for secure financial data access without traditional API redistribution. Innovation: stateless agent protocol enables remote function execution in browser, solving compliance and security issues. Philosophy: anything humans can do, AI must be able to do.
Most Critical Learning: Data quality examination beats algorithmic sophistication - teams that iterate fastest on understanding their data consistently build better RAG systems
Most Underutilized Technique: Fine-tuning embeddings and re-rankers - both are more accessible and impactful than most teams realize
Biggest Gap: Most teams focus on model selection and prompting but underinvest in document processing, evaluation frameworks, and understanding their specific data distribution
The series reveals that successful RAG systems require a portfolio of techniques rather than silver bullets, with data understanding and systematic evaluation being the foundational capabilities that enable everything else.
For more information about the broader curriculum, see the main index.
Get access to exclusive discounts and our free 6-day email course on RAG improvement:
There's a reason Google has separate interfaces for Maps, Images, News, and Shopping. The same reason explains why many RAG systems today are hitting a performance ceiling. After working with dozens of companies implementing RAG, I've discovered that most teams focus on optimizing embeddings while missing two fundamental dimensions that matter far more: Topics and Capabilities.
"Those who can't do, teach" is wrong. Here's proof: I taught at the Data Science Club while learning myself. If I help bring a room of 60 people even 1 week ahead, in an hour, that's 60 weeks of learning value creation. That's more than a year of value from one hour. Teaching isn't what you do when you can't perform. It's how you multiply your impact.
Its a duty.
Retrieval augmented generation (RAG) is a technique that enhances the capabilities of large language models (LLMs) by integrating them with external knowledge sources. In essence, RAG combines the generative power of LLMs with the vast information stored in databases, documents, and other repositories. This approach enables LLMs to generate more accurate, relevant, and contextually grounded responses.
This article explains six proven strategies to improve Retrieval-Augmented Generation (RAG) systems. It builds on my previous articles and consulting experience helping companies enhance their RAG applications.
By the end of this post, you'll understand six key strategies I've found effective when improving RAG applications:
Picture this: You're sitting at your desk, contemplating the leap into AI consulting. Maybe you're a seasoned ML engineer looking to transition from contractor to consultant, or perhaps you've been building AI products and want to branch out independently. Whatever brought you here, you're wondering how to transform your technical expertise into a thriving consulting practice.
I want to share something that completely changed my consulting business: writing consistently.
Last month, a founder reached out saying, "I don't know who you are, but your blog posts keep showing up in our team's Slack. Are you available to help us?"
Two days later, we closed a $140,000 deal (for a 3-month project). Only 3 sales calls were needed.
This wasn't luck – it was the compound effect of putting words on the page every single day.