Skip to content

Writing and Communication

The right way to do AI engineering updates

Helping software engineers enhance their AI engineering processes through rigorous and insightful updates.


After working with over a dozen startups trying to build out their AI engineering teams and helping them transition their software engineering practices to applied AI, I noticed a couple of shortcomings, there's a pressing need to adapt our communication methods to better reflect the complexities and uncertainties inherent in these systems.

In this post, we'll explore how adopting a more rigorous approach to updates—focusing on hypotheses, interventions, results, and trade-offs—can significantly improve project outcomes. We'll delve into real-world examples, highlighting successes, failures, and the invaluable lessons learned along the way. Whether you're a software engineer new to AI, a junior AI engineer, or a VP of engineering overseeing AI initiatives, this guide aims to enhance your understanding of effective communication in the realm of AI engineering.


What is a bad update?

Hey guys, I tried some of the suggestions we had last week, and the results look a lot better.

This is a bad update. It's vague. It's not helpful. It doesn't communicate what worked and what didn't.

It's a description of an activity, not a description of an experiment.

  1. Adjectives mean you're hiding something. Quantify or don't even mention it.
  2. Not having a clear hypothesis makes it impossible to interpret the results.
  3. Subjective metrics are meaningless, when 1% could be massive or microscopic.

What is a good update?

I tried lexical search, semantic search, and hybrid indexing. We were able to get 85% recall at 5 and 93% recall at 10, which is about a 16% relative improvement from whats currenty deployed, Its only a few lines of code so it should be pretty cheap to roll out.

Metric Baseline Hybrid Search Re-ranking
Recall @ 5 73% 85% (+16.4%) 88% (+20.5%)
Recall @ 10 80% 93% (+16.3%) 95% (+18.8%)

This is a good update. It's clear what was done, the results are quantifiable, and the trade-offs are acknowledged. and came with a table to show the results, no adjectives needed.

I tried adding a re-ranking layer. It improves results by like 3% but adds 70ms to 700ms latency to the application. Based on other things I've looked up, it might not be worth it. That said, if any of these re-ranking models were to get faster in the next couple of months, I'd definitely think we should revisit.

This is also great, even though results are lower, the trade-off is clearly understood and communicated. We even have a plan to revisit if certain conditions are met, like faster or smarter re-ranking models.

The Challenge of Communicating

Imagine you're part of a team building an AI agent designed to provide accurate and relevant search results. Unlike traditional software systems, AI models don't always produce deterministic outcomes. They're probabilistic by nature, meaning their outputs can vary even when given the same input. This inherent uncertainty presents a unique challenge: How do we effectively communicate progress, setbacks, and insights in such an environment?

Traditional update formats—like stating what you did last week or identifying blockers—aren't sufficient. Instead, we need to shift our focus towards:

  • Hypotheses: What do we believe will happen if we make a certain change?
  • Interventions: What specific actions are we taking to test our hypotheses?
  • Results: What are the quantitative outcomes of these interventions?
  • Trade-offs: What are the benefits and costs associated with these outcomes?

A New Approach, (old for many of us)

To illustrate the power of this approach, let's dive into a series of examples centered around RAG—a crucial aspect of building effective AI agents.

Scenario Setup

Our team is enhancing a search engine's performance. We're experimenting with different search techniques:

  • Lexical Search (BM25): A traditional term-frequency method.
  • Semantic Search: Leveraging AI to understand the context and meaning behind queries.
  • Hybrid Indexing: Combining both lexical and semantic searches.
  • Re-ranking Models: Using advanced models like Cohere and RankFusion to reorder search results based on relevance.

Our primary metric for success is Recall at 5 and at 10—the percentage of relevant results found in the top 5 or 10 search results.


Example 1: A High-Impact Intervention

We implemented a hybrid search index combining BM25 and semantic search, along with a re-ranking model. Recall at 5 increased from 65% to 85%, and Recall at 10 improved from 78% to 93%. User engagement also increased by 15%. While there's a slight increase in system complexity and query processing time (~50ms), the substantial gains in performance justify these trade-offs.

Metric Semantic Search Hybrid Search Hybrid + Re-ranking
Recall @ 5 65% 75% (+15.4%) 86% (+32.3%)
Recall @ 10 72% 83% (+15.3%) 93% (+29.2%)
Latency ~50ms ~55ms (+10%) ~200ms (+264%)

Hypothesis

Integrating a hybrid search index combining BM25 and semantic search will significantly improve Recall at 5 and 10 since reranking after a hybrid search will provide better ranking

Intervention

  • Action: Developed and implemented a hybrid search algorithm that merges BM25's lexical matching with semantic embeddings.
  • Tools Used: Employed Cohere's re-ranking model to refine the search results further.

Results

  • Recall at 5: Increased from 65% to 85% (a 20% absolute improvement).
  • Recall at 10: Improved from 72% to 93% (a 21% absolute improvement).
  • User Engagement: Time spent on the site increased by 15%, indicating users found relevant information more quickly.

Trade-offs

  • Complexity: Moderate increase in system complexity due to the integration of multiple search techniques.
  • Computational Cost: Slight increase in processing time per query (~50ms additional latency).

Conclusion

The substantial improvement in recall metrics and positive user engagement justified the added complexity and computational costs. This intervention was definitely worth pursuing.


Example 2: When Small Gains Aren't Worth It

We experimented with a query expansion technique using a large language model to enhance search queries. While this approach showed promise in certain scenarios, the overall impact on recall metrics was mixed, and it introduced significant latency to our search system.

Metric Baseline Query Expansion
Recall @ 5 85% 87% (+2.4%)
Recall @ 10 93% 94% (+1.1%)
Latency ~200ms ~1800ms (+800%)

Hypothesis

Implementing query expansion using a large language model will enhance search queries and improve recall metrics, particularly for complex or ambiguous queries.

Intervention

  • Action: Implemented query expansion using a large language model to enhance search queries.
  • Objective: Improve recall metrics, particularly for complex or ambiguous queries.

Results

  • Recall at 5: Improved from 85% to 87% (2% absolute improvement).
  • Recall at 10: Improved from 93% to 94% (1% absolute improvement).
  • Processing Time: Increased latency from ~200ms to ~1800ms (800% increase).
  • System Complexity: Significant increase due to the integration of a large language model for query expansion.

Trade-offs

  • Marginal Gains: The slight improvement in recall did not justify the substantial increase in latency.
  • Performance Overhead: The significant increase in latency could severely impact user satisfaction.
  • Maintenance Burden: Higher complexity makes the system more difficult to maintain and scale.
  • Resource Consumption: Integrating a large language model requires additional computational resources.

Conclusion

Despite the modest improvements in recall metrics, the substantial increase in latency and system complexity made this intervention impractical. The potential negative impact on user experience due to increased response times outweighed the marginal gains in search accuracy. Therefore, we decided not to proceed with this intervention.

If smaller models become faster and more accurate, this could be revisited.


Embracing Failure as a Learning Tool

We should also embrace failure as a learning tool, its not a waste of time as it has helped you refine your approach, your knowledge, and your systems and where not to go.

I also like updates to include examples of before and after the interfectinos when possible to show the impact. As well as examples of failures and what was learned from them.

Example

We experimented with a query expansion technique using a large language model to enhance search queries. While this approach showed promise in certain scenarios, the overall impact on recall metrics was mixed, and it introduced significant latency to our search system. Here some examples of before and after the intervention.

print(expand_v1("Best camera for low light photography this year ")
{
   "category": "Camera",
   "query": "low light photography",
   "results": [
      "Sony Alpha a7 III",
      "Fujifilm X-T4"
   ]
}
print(expand_v2("Best camera for low light photography")
{
   "query": "low light photography",
   "date_start": "2024-01-01",
   "date_end": "2024-12-31",
   "results": [
      "Sony Alpha a7 III",
      "Fujifilm X-T4"
   ]
}

We found that these expansion modes over dates did not work successfully because we're missing metadata around when cameras were released. Since we review things that occur later than their release, this lack of information has posed a challenge. For this to be a much more fruitful experiment, we would need to improve our coverage, as only 70% of our inventory has date or time metadata.

These examples and insights demonstrate the value of embracing failure as a learning tool in AI engineering. By documenting our failures, conducting regular reviews, and using setbacks as fuel for innovation, we can extract valuable lessons and improve our systems over time. To further illustrate how this approach can be implemented effectively, let's explore some practical strategies for incorporating failure analysis into your team's workflow

  1. Document Your Failures:
  2. Maintain a "Failure Log" to record each unsuccessful experiment or intervention.
  3. Include the hypothesis, methodology, results, and most importantly, your analysis of why it didn't work.
  4. This practice helps build a knowledge base for future reference and learning.

  5. Conduct Regular Failure Review Sessions:

  6. Schedule monthly "Failure Retrospectives" for your team to discuss recent setbacks.
  7. Focus these sessions on extracting actionable insights and brainstorming ways to prevent similar issues in future projects.
  8. Encourage open and honest discussions to foster a culture of continuous improvement.

  9. Use Failure as Innovation Fuel:

  10. Encourage your team to view failures as stepping stones to breakthrough innovations.
  11. When an experiment fails, challenge your team to identify potential pivot points or new ideas that emerged from the failure.
  12. For example, if an unsuccessful attempt at query expansion leads to insights about data preprocessing, explore how these insights can be applied to improve other areas of your system.

Effective Communication Strategies for Probabilistic Systems

Tips for Engineers and Leaders

  1. Emphasize Hypotheses:
  2. Clearly state what you expect to happen and why.
  3. Example: "We hypothesize that integrating semantic search will improve recall metrics by better understanding query context."

  4. Detail Interventions:

  5. Explain the specific actions taken.
  6. Example: "We implemented Cohere's re-ranking model to refine search results after the initial query processing."

  7. Present Quantitative Results:

  8. Use data to showcase outcomes.
  9. Example: "Recall at 5 improved from 65% to 85%."

  10. Discuss Trade-offs:

  11. Acknowledge any downsides or costs.
  12. Example: "While we saw performance gains, processing time increased by 50ms per query."

  13. Be Honest About Failures:

  14. Share what didn't work and potential reasons.
  15. Example: "Our attempt at personalization didn't yield results due to insufficient user data."

  16. Recommend Next Steps:

  17. Provide guidance on future actions.
  18. Example: "We recommend revisiting personalization once we have more user data."

  19. Visual Aids:

  20. Use before-and-after comparisons to illustrate points.
  21. Include charts or tables where appropriate.

Conclusion

Building and improving AI systems is an iterative journey filled with uncertainties and learning opportunities. By adopting a rigorous approach to updates—focusing on hypotheses, interventions, results, and trade-offs—we can enhance communication, make better-informed decisions, and ultimately build more effective AI agents.

For software engineers transitioning into AI roles, junior AI engineers honing their skills, and VPs overseeing these projects, embracing this communication style is key to navigating the complexities of probabilistic systems. It fosters transparency, encourages collaboration, and drives continuous improvement.

Content Creation Mastery: 9 Strategies to 10x Your Impact

Look, creating content that actually matters is hard. Here's how to do it without the bullshit:

  1. Titles That Demand Attention: Your title is the gatekeeper. Make it count or no one will read your shit.

  2. Hook with a Powerful Intro: You've got 15 seconds. Don't waste them.

  3. Use Evidence, Not Adjectives: "Our platform is blazing fast" means nothing. "3ms average response time" does.

  4. Foreshadow Value: Tell them exactly what they'll get. No vague promises.

  5. Structure for Scanners: People skim. Deal with it. Use headers, bullet points, and short paragraphs.

  6. Make It About Them, Not You: No one cares about your journey. They care about their problems.

  7. Be an Oracle: Predict future challenges. Be right more often than not.

  8. One Clear Call-to-Action: What do you want them to do? Ask for it. Once.

  9. Iterate Based on Data: If it's not working, change it. Ego has no place here.

1. Craft Titles That Demand Attention

Your title is make-or-break. Here's how to not fuck it up:

  • Evoke emotion: "The Writing Hack That Tripled My Audience Overnight"
  • Address pain points: "End 'Writer's Block' Forever: A Foolproof 3-Step System"
  • Offer clear value: "5 Persuasion Techniques That Boosted Our Sales by 287%"
  • Use numbers: "7 Unconventional Marketing Tactics Used by Top Brands"
  • Create urgency: "Limited Time: Learn the SEO Secret That's Transforming Businesses"
  • Ask intriguing questions: "Is Your Content Strategy Secretly Sabotaging Your Growth?"

A/B test your titles. Use tools for keyword research. Keep it under 60 characters for search engines.

2. Hook with a Powerful Intro

You've got their click. Now keep them. Here's how:

  1. Validate their challenge
  2. Hint at your solution
  3. Establish why they should listen to you

Example: "Struggling to stand out? You're not alone. After helping 100+ creators grow their audiences by 500%+, I've cracked the code. Here's how to turn readers into raving fans."

Use shocking stats, the PAS formula, or a relatable story. Keep it under 5 sentences.

3. Use Evidence, Not Adjectives

Vague claims are worthless. Be specific:

❌ "Our platform is blazing fast" ✅ "Our platform delivers 3ms average response time with 99.99% uptime last quarter"

Use: - Data and statistics - Case studies - Expert quotes - Before and after comparisons - Social proof

Always cite sources. Use visuals to make data digestible.

4. Foreshadow Value

Tell them exactly what they'll get:

"By the end of this guide, you'll know how to: - Boost email open rates by 203% - Craft headlines that convert 43% better than average - Create 10 high-engaging pieces from a single idea - Cut content creation time in half while doubling output - Land features in Forbes, Entrepreneur, and TechCrunch"

Be specific. Align with their pain points.

5. Structure for Scanners

People skim. Make it easy for them:

  • Short paragraphs (2-3 sentences max)
  • Bullet points and numbered lists
  • Descriptive subheadings
  • Bold key phrases
  • Use white space
  • Include relevant images
  • Pull quotes for emphasis
  • Table of contents for longer pieces

Use the inverted pyramid: Most important info first.

6. Make It About Them, Not You

No one cares about your journey. They care about their problems. Focus on that:

❌ "I increased conversions by 50% using this method" ✅ "You can boost your conversions by 50% with this proven method"

  • Use "you" language
  • Address reader benefits directly
  • Ask questions
  • Use relatable scenarios
  • Provide actionable takeaways
  • Anticipate and address objections

Always ask: "So what? How does this benefit my reader?"

7. Be an Oracle: Predict Future Challenges

Show them you're ahead of the curve:

  1. Analyze industry trends
  2. Predict audience evolution
  3. Look for cross-industry insights

Example: "While everyone's mastering short-form video, prepare for immersive, interactive content. By 2026, 30% of content will have an AR/VR component. Here's how to get ahead."

Back predictions with data. Offer actionable steps for each prediction.

8. One Clear, Compelling Call-to-Action

Tell them exactly what to do next. Once.

  • Make it stand out visually
  • Use action-oriented language
  • Clearly state the benefit
  • Create urgency when appropriate
  • Ensure it's relevant to the content

Example: "Join 50,000+ content pros getting weekly insider tips. Sign up now!"

A/B test your CTAs. Optimize for mobile.

9. Iterate and Improve Based on Data

If it's not working, change it. Track:

  • Engagement metrics (time on page, scroll depth, shares, comments)
  • Conversion metrics (sign-ups, downloads, purchases)
  • SEO metrics (organic traffic, keyword rankings, backlinks)
  • Content-specific metrics (video watch time, podcast listen-through rate)

Analyze top performers. A/B test everything. Update high-performing older content.

Remember: Content creation is both art and science. Creativity matters, but data drives results.

Now go create something worth reading.

How I want you to write

I'm gonna write something technical.

It's often less about the nitty-gritty details of the tech stuff and more about learning something new or getting a solution handed to me on a silver platter.

Look, when I read, I want something out of it. So when I write, I gotta remember that my readers want something too. This whole piece? It's about cluing in anyone who writes for me, or wants me to write for them, on how I see this whole writing product thing.

I'm gonna lay out a checklist of stuff I'd like to have. It'll make the whole writing gig a bit smoother, you know?

Crafting Compelling Titles

I often come across titles like "How to do X with Y,Z technology." These don't excite me because X or Y are usually unfamiliar unless they're already well-known. Its rarely the dream to use X unless X is the dream.

My dream isn’t to use instructor, its to do something valueble with the data it extracts

An effective title should:

  • Evoke an emotional response
  • Highlight someone's goal
  • Offer a dream or aspiration
  • Challenge or comment on a belief
  • Address someone's problems

I believe it's more impactful to write about specific problems. If this approach works, you can replicate it across various scenarios rather than staying too general.

  • Time management for everyone can be a 15$ ebook
  • Time management for executives is a 2000$ workshop

Aim for titles that answer questions you think everyone is asking, or address thoughts people have but can't quite articulate.

Instead of "How I do something" or "How to do something," frame it from the reader's perspective with "How you can do something." This makes the title more engaging. Just make sure the difference is advisory if the content is subjective. “How I made a million dollars” might be more reasonable than “How to make a million dollars” since you are the subject and the goal might be to share your story in hopes of helping others.

This approach ultimately trains the reader to have a stronger emotional connection to your content.

  • "How I do X"
  • "How You Can do X"

Between these two titles, it's obvious which one resonates more emotionally.

You can take it further by adding specific conditions. For instance, you could target a particular audience or set a timeframe:

  • How to set up Braintrust
  • How to set up Braintrust in 5 minutes

NO adjectiives

I want you to almost always avoid adjectives and try to use evidence instead. Instead of saying "production ready," you can write something like "scaling this to 100 servers or 1 million documents per second." Numbers like that will tell you exactly what the specificity of your product is. If you have to use adjectives rather than evidence, you are probably making something up.

There's no reason to say something like "blazingly fast" unless those things are already known phrases.

Instead, say "200 times faster" or "30% faster." A 30% improvement in recommendation system speed is insane.

There's a 200 times performance improvement because we went from one programming language to another. It's just something that's a little bit more expected and understandable.

Another test that I really like using recently is tracking whether or not the statements you make can be:

  • Visualized
  • Proven false
  • Said only by you

If you can nail all three, the claim you make will be more likely to resonate with an audience because only you can say it.

Earlier this year, I had an example where I embedded all of Wikipedia in 17 minutes with 20 bucks, and it got half a million views. All we posted was a video of me kicking off the job, and then you can see all the log lines go through. You see the number of containers go from 1 out of 50 to 50 out of 50.

It was easy to visualize and could have been proven false by being unreproducible. Lastly, Modal is the only company that could do that in such an effortless way, which made it unique.

Strong Introduction

So, if you end up doing any kind of sales, you'll realize that.

What you actually need to understand is not what you have to offer as the product, but the size of the pain that the prospect is going through.

  • There are going to be readers that are just kind of curious and bored. They're not really going to be the ones that care about the product itself unless you can contextualize the pain for them.
  • It's really important to have an introduction that contextualizes the pain and foreshadows the solution.

If we can build that trust and I can correctly describe the pain that you are going through, then you will believe me when I am predicting the pain that you may also go through in the future. Ultimately, that is how you become a leader in the space—by demonstrating your ability to be right consistently.

The next time you publish or write something, they will believe it, and they will believe that they get value from it.

Strong Hooks

In the same sense that a title should often try to change the "how I" to a "how you" by eliciting an emotional response, the introduction can also help select the reader into a group that is feeling the pain.

This is the same reason why a plumber will have an introduction that says, "Do you have a leaky faucet? Call 1-800-PLUMBERS." That's a much more selective hook than just "I'm the best plumber in town." You can say that to everybody, whereas if someone answers the question of whether they have a leaky faucet, it automatically selects them to be a part of their readership.

I truly believe if you try to build a product too soon for everybody, you're gonna end up in a bad place.

Foreshadow Content

Once you hook them, you still have to first retain them. You can do that by foreshadowing the content you'll cover and even hinting at the reward.

For example, an introduction could look like the following:

  • If you're making $10,000 a month consulting right now, my goal at the end of this blog post is to help you increase your prices by:
    • Asking the right questions so you understand the value of the solution you're offering.
    • Providing tips on writing proposals and offering different options you can let your customer pay you more.
    • Lastly, sharing some anecdotes with you on how I became more comfortable with charging two or three times more than I did when I started

Here, I've pre-qualified the reader for a certain range and told them what their goal is by the end of the post.

Two things are just the tips and the questions I'm going to suggest, and then the final reward is something a little bit more personal. Ideally, they read the first two knowing that my personal stories are coming after. That intro itself outlines the entire post.

Use Lists

Once you've hooked your audience, you have to retain them and reward them.

You'll also see that as part of the foreshadowing content, I've listed three items that I want them to take away. I can also be specific with the number of questions and the number of tips I'm providing.

By using lists and counting things, I can give them a notion of progress towards the final conclusion. If I'm on the second part, I know where I am in the story. The list itself allows us to break down the body and give the reader a sense of position. Since they know where they are, they know where they're going to be.

Demonstrate being an Oracle

One thing that's also really valuable to call out is the fact that you want to be seen as a leader or an oracle to this audience.

For example, if we go back to this charging more case, it's one thing to demonstrate that you understand that the reader's dream is to be able to charge more. It's useful that you're giving them a couple of tips and stories. But what can be even more powerful is to simultaneously:

  • Demonstrate your knowledge of the current problem.
  • Predict future problems as they come.
  • Foreshadow or reference those future issues in later content.

For example, the things you do to go from $10,000 to $30,000 a month are very different from what you would need to do to get to $200,000 a month. They require things like hiring, prioritizing your services, and improving distribution. If you can foreshadow that and call that out, when your audience gets to those levels, there's a chance they will remember what you said.

They'll think, "Wow, Jason was not only right about where I was but also where I was going." This brings a tremendous sense of trust and value.

It's not just this idea of future work or future considerations of the current work, but actually being able to predict the problems they're going to have in the future and suggesting that you are also the solution to those problems. You can foreshadow that as part of a series or whatever, but the general idea remains.

Have a Strong CTA

You also have to think about what the reward for yourself is. Ultimately, you should be writing this because you think the message is important and that you believe your audience should and deserves to get this message. The content of this post delivers the reward to those who stay and finish the article.

But at the end, you also have to get something in return. You should ask your user to do something. If it's a tweet, a simple one could be a repost, a like, or a share. It could even be a follow. It could be entering some GitHub link and giving the repost a star, etc.

What I realized to be very important was to make sure you only ask for one thing and don't split the attention. If you do that, you can have solid metrics on how you phrase CTAs and how that converts to certain content. For example:

  • How many people from the tweet go to the blog post?
  • How many go from the blog post to a subscription to a newsletter?

The more I think about it, the more I believe that most people should be capturing information into a newsletter rather than just Twitter. A direct email is so much more powerful.

No matter what it is, make sure you only ask for one thing. Sometimes it's to sign up, sometimes it's to try a one-click deploy. But if there's no action that your users can take, you've definitely made a mistake.

It's also good to call out that taking the action should have some kind of outcome as well.

  • If you want to see more of this content, follow me because I post twice a week.
  • That would be an example that I can qualify as a user and set expectations on what the outcome is.
  • If you like it, then subscribe. You'll get two posts a week.

I also think in many situations we should have tiers of qualified CTUs.

If you want to split your traffic, you should have an obvious condition as to which one someone should take. For example, on Indexify:

  • If you're dealing with terabyte-scale datasets, contact us.
  • If you want to try the Open Source library

It sets a prequalifier.

The terabyte-scale dataset is an evidence-based pre-filter. It could have said "in production," but something so specific like a terabyte or a terabyte a day really qualifies who should contact indexify

Anatomy of a Tweet

The goal of this post is basically to share what I have learned about writing a tweet, how to think about writing a hook, and a few comments on how the body and the cta needs to retain and reward the user. Its not much, I've only been on twitter for about 6 month.