What is Retrieval-Augmented Generation?

What is Retrieval-Augmented Generation?

What is Retrieval-Augmented Generation? 1024 643 Matthew Jaworski

Generative AI (GenAI) systems like ChatGPT have taken the world by storm. And they’re largely deserving of the hype. They’ve been trained on much of the world’s public knowledge and use this knowledge to produce useful responses to an amazing variety of questions. Through their training, they have even gained an uncanny ability to understand language and grammar, giving users an easy, natural way to interact with them. While GenAI systems are useful for many scenarios, they can only produce answers relating to data they’ve already been trained on. As general models, they (thankfully) haven’t been trained on private data such as the details of a person’s personal finances or health or which insurance policies they have.  There are use cases, however, such as in customer service, where combining the chat ability of GenAI with private knowledge can be quite beneficial.  With this knowledge, a person can quickly find answers about their account, status or products without involving a support agent. This can be implemented if a user’s private data remains private. That’s where Retrieval Augmented Generation, or RAG for short, comes in. In this blog post, we’ll dig in a bit to learn more about RAG, see how it ticks and show where it’s being effectively applied.

RAG retrieves knowledge from custom data sources to anchor and enrich GenAI responses. Implementing RAG in a chat system using large language models (LLMs) offers two key advantages: it ensures access to the latest reliable facts based on a controlled dataset and it makes the model’s sources transparent, enabling users to verify its claims.

Andrew McKenna, VP of Product Management at DataMotion explained that the goal is to allow users to rapidly find answers in a self-service manner while keeping the human connection only a click away.

Since RAG produces relevant answers from data not in the LLMs training dataset, it eliminates the risk of data leakage and the generation of inaccurate or misleading responses. Because RAG-generated answers rely on customer data rather than the LLM’s internal knowledge, implementing RAG can significantly reduce the necessity for model training. This leads to cost savings when employing this technique to empower virtual assistants in enterprise use cases.

Enhancing Large Language Models with Retrieval Augmented Generation (RAG)

To understand where RAG fits into the overall picture of GenAI, it’s important to take a step back and get some perspective.  At the core of foundation models, which are extensively trained AI models adaptable to various applications, including Large Language Models (LLMs), lies an AI architecture known as the transformer. The transformer transforms raw data into a condensed representation of its fundamental structure and then uses this representation to generate a desired output in the same format as the input data. A foundation model can be customized for different tasks by fine-tuning it with specific, labeled domain knowledge.

However, more than fine-tuning alone is needed to provide the model with the depth of knowledge required to answer highly specific questions in ever-changing contexts. In a 2020 paper, Meta introduced a framework called retrieval augmented generation (RAG) to grant LLMs access to information beyond their initial training data. RAG empowers LLMs to leverage specialized knowledge sources for more accurate responses.

As the name suggests, RAG consists of two phases: data retrieval and response generation augmented by data found during retrieval. During the retrieval phase, search algorithms retrieve relevant information snippets based on the user’s prompt or question. These facts can originate from documents and activities unique to the user. In enterprise use cases, this private selection of sources is essential to meet stringent security and data integrity requirements.

These snippets of facts, known as context, are added to the user’s question or prompt and provided to the language model. In the generative phase, the LLM uses the augmented prompt and its internal understanding of language to generate a tailored response for the user. This response can be delivered to a chatbot along with links to its information sources.

Evolution of Virtual Assistants: From Manual Scripts to Dynamic Responses

Prior to LLMs, digital chat agents adhered to a pre-defined dialogue process. They verified the customer’s intent, retrieved the requested information, and provided responses based on scripts designed to fit anticipated situations. This manual decision-tree approach functioned effectively for simple queries.

However, this approach had its drawbacks. Creating scripts for every potential question was time-consuming, and if a scenario was overlooked, the chatbot lacked the capability to adapt spontaneously. Keeping scripts up to date as policies and situations evolved was either impractical or unfeasible.

Presently, virtual assistants powered by LLMs provide customers with more individualized responses, eliminating the requirement for brittle admin-generated scripts. RAG takes this further by eliminating the need to provide new examples for model training. Instead, you can update the most recent documents, policies or activity, and the model will retrieve the necessary information in its search phase before responding to queries.

Following, we’ve highlighted two scenarios demonstrating the power of Retrieval Augmented Generation (RAG) in the financial sector, however, use cases span industries.

The virtual assistant built into the DataMotion customer engagement platform utilizes RAG to ground its response on content that is relevant to the user and can be verified and trusted.

Consider a client, John, who is interested in exploring investment options for his retirement. He wants to know how much he can contribute annually.

To formulate its response, the system first accesses John’s financial account information to determine the contribution limits. These details are incorporated into John’s initial inquiry and conveyed to the LLM, which generates a personalized answer from this data. The virtual assistant then displays the response.

Challenges in Financial Customer Queries and Model Responses

Customer queries in the financial services sector are sometimes complicated. They can be intricately worded, multifaceted, or require private knowledge the model doesn’t have access to. These situations can lead to LLM-only solutions providing inaccurate responses.

In a more challenging real-life scenario, let’s consider John, who wants to understand the implications of withdrawing funds early from his retirement account for a home purchase. A virtual assistant without utilizing RAG may respond with: “Go ahead, you have our full support!” Early withdrawal policies vary depending on the specific type of retirement account, tax implications, and applicable penalties. When the LLM can’t locate a specific answer, the ideal response would be, “I apologize, but I don’t possess that information,” or it could continue to ask additional questions until it can identify a question it can confidently answer. Instead, it generates a response using language from its training set, potentially leading to misinformation.

Through extensive fine-tuning, training a LLM to recognize when it’s unsure and pause instead of providing an inaccurate answer is possible. However, this training often requires exposure to numerous examples of answerable and unanswerable questions. In many cases, the model will need to learn from thousands of such examples to become proficient at identifying unanswerable questions and seeking more details until it finds a question it can respond to. This entails acquiring a high-quality dataset that accurately describes the desired output and then paying for the computational cost of the fine-tuning, which may not be feasible for everyone.

RAG stands out as the leading tool for grounding LLMs in the most up-to-date and verifiable information, all while reducing the need for constant retraining and updates. At DataMotion, our focus is on collaborating with our customers and partners to drive innovation throughout the entire process. With our extensive enterprise experience and a customer-focused approach, we employ cutting-edge technology like RAG to provide enhanced experiences for the modern digital world.

Stay informed on the latest in GenAI and RAG by subscribing to the DataMotion Newsletter. Dive into a world of insightful updates and expert commentary, and be part of our community that’s shaping the future of technology. Or, request a demo with one of our experts to see how DataMotion can revolutionize your customer experience journey.

Learn more about our AI solution, JenAI Assist™, on our new microsite at ai.datamotion.com.