Written byJacob Daddario
Thanks to the AI boom, there's been a deluge of marketing and buzzwords making a recent appearance. One acronym worth knowing? RAG, short for Retrieval Augmented Generation.
Let's break down what it is, how it works, and why it's more than just another buzzword.
Large language models like ChatGPT don't have up-to-the-minute information, nor do they know about your business. They don't have access to your latest CRM data, your internal docs, or call transcripts with clients.
But with RAG, we can bridge that gap by supplying relevant context when asking the model to generate a response. Here's a basic example of what that might look like in action:
User:
How long do I have to close the deal with Barber Supply Co.?
The following are call transcripts that may be relevant to your discussion: client_name: Barber Supply Co. transcript[...]
Assistant:
Barber Supply Co. wants you to start the project by March, so closing likely needs to happen before that.
By giving the assistant existing call transcripts, the assistant is now able to pull information from that transcript to answer the user's question. Seems easy enough, but there's an important part of the technique that we're still missing.
How do we automate the process of fetching the relevant documents for the assistant?
This is where vector embeddings come in.
When we create a vector embedding of a document or a chunk of a document, we're translating its meaning into a numerical format — literally a list of numbers. This lets us search documents by meaning, not just keywords. We store those embeddings (along with the original content) in a vector database.
Once you've stored your data in a vector database, you're now able to find relevant sections of those documents using similarity search. Similarity search is actually pretty simple. You can think of the number representations of the documents as arrows that point somewhere in space. They exist in high-dimensional space, but for ease of explanation, we can consider these arrows as if they were two-dimensional.
Hypothetical vector representations of call transcripts with different companies
In the graphic above, you'll see how documents that talk about similar things, like the same customer, tend to cluster together. For example, anything related to Barber Supply Co. ends up in that top-right corner.
Now imagine that same clustering happening in other directions, too—maybe one axis for deal health, another for customer sentiment, and so on. It's a simplified view of how it works behind the scenes, but it captures the big idea. Similar content lives close together, making it easy to find the right context when you need it.
Then, when a user asks a question, we:
The general flow of a RAG pipeline
Just like that, the model has the right context to generate a meaningful, accurate response.
Setting up a RAG pipeline like the one described here can be challenging. Thankfully, AI providers like OpenAI already provide great tools like document search right on their platform. But to use those tools, you need an integration between their platform and your data sources.
That's where Venn Technology comes in.
We help businesses like yours integrate and automate your systems so your AI tools actually have the context they need. Whether you want to:
We've got you covered.
Need help with the heavy lifting? Contact us and let's build something smart together.
As a developer at Venn, Jacob believes that programming is like digital craftsmanship—using skills and many different tools to build an easy-to-use product for others. Jacob is a tech enthusiast and finds humor in the quote “We might not know how the pyramids were built, but we do know how to connect a Linux machine to the internet.” On his bucket list, he wants to see Texas A&M play University of Texas in football. Outside of work, you can find Jacob cooking up new recipes and cuisines to try.