Retrieval-augmented generation (RAG) offers the chance for answers from trusted sources in seconds and its promise is closer than we think explain Patrick Meyer, artificial intelligence senior architect and technical director at Sopra Steria UK, and Clément Benesse, senior AI researcher at opsci.ai.
Picture a doctor facing a complex, urgent diagnosis. He has access to thousands of scientific studies, medical reports, patient files and personal notes. Yet sorting through this deluge of information in time is impossible.
Now, imagine an assistant that instantly filters the most relevant facts and delivers a precise, sourced and concise answer in seconds. Far from science fiction, this is the promise of retrieval-augmented generation (RAG).
From guesswork to grounded answers: How RAG enhances LLMs
Large language models (LLMs) often face criticism for inconsistency. Sometimes they provide razor-sharp answers; at other times, they simply invent content. You may have felt that ChatGPT responds without truly understanding the topic. This happens because LLMs do not comprehend meaning directly. They identify statistical relationships between words and guess the “right” sequence of terms with varying levels of success.
RAG addresses this by grounding the model’s output in factual, external sources. The LLM still generates the answer but must rely on the references at hand. “RAG combines two essential elements: search and response generation,” says Patrick Meyer, artificial intelligence senior architect & technical director at Sopra Steria UK. “It brings a company’s internal knowledge into a language model’s general understanding to answer questions.”
In practice, RAG retrieves information from a company’s databases, then uses a language model to generate the answer. As Meyer puts it: “RAG models are like advanced search engines.”
“When you browse a website, you can hunt for information by exploring pages or by asking a direct question. With RAG – the ‘G’ stands for generation – you can get a summary or even a direct answer, which saves a lot of time,” he explains.
Precision, versatility and trust: The Core advantages of RAG
RAG is simple and effective, producing concise, contextualised, and traceable responses that allow swift decision-making. “One advantage is that RAG doesn’t require extraordinary computing power, unlike fine-tuning,” notes Clément Benesse, senior AI researcher at opsci.ai. “In addition, RAG retains all the features of LLMs – prompt engineering, answer formatting, style – while adding modular knowledge.”
RAG is also remarkably versatile. It can operate in any business domain, from customer support and finance to human resources, and beyond. Sopra Steria already has around fifty clients worldwide experimenting with it, and the company itself uses RAG internally to handle queries drawn from its own reference materials.
Trustworthiness emerges from the method’s ability to link answers directly to their sources. Meyer explains, “Unlike a language model trained on all documentation, which can’t specify where its information comes from, RAG includes a retrieval component that preserves the source. You know exactly where the information originated.”
And by grounding its output in factual information, RAG mitigates hallucinations. “RAG can prevent hallucinations caused by outdated information,” he says. “If I say ‘tree,’ most people imagine a trunk with leaves, but a mathematician sees a decision tree, and a mechanic thinks of a camshaft. Such ambiguity often leads to hallucinations if the system selects the wrong interpretation.”
Working within constraints: The user’s role and RAG’s limits
RAG’s success depends on how users phrase their queries and on how data is structured and labelled. Sometimes it cannot answer if the question falls outside its scope.
“The limitation comes from the user,” notes Meyer. “With ChatGPT, you can ask anything because it holds all information inside the model. RAG must retrieve data and requires a prompt aligned with how you want to answer. It cannot invent information, so it can’t address all questions.”
RAG also deliberately limits an LLM’s reach by steering the available information – essentially ‘bridling’ the model. “If RAG is poorly implemented, it may produce information that isn’t incorrect per se but isn’t truly helpful,” explains Benesse. “It’s like handing an analyst the wrong memo, damaging results or slowing the process. Fortunately, this pipeline is relatively well-defined, and problems usually stem from poor-quality source documents rather than flaws in the system.”
Embracing complexity: From multimodality to linguistic sovereignty
Integrating various document types – text, images, and video – into a single language model and making them coexist productively remains a major challenge. “Information today is scattered across text, images and video,” says Benesse. “The difficulty lies in bringing everything together into a unified representation space, so that the system can link a passage of text to a fragment of an image. Techniques such as composite embeddings and knowledge graphs offer considerable promise.”
Ensuring that models represent diverse languages and cultures remains equally complex, particularly when addressing linguistic and cultural biases. “Models are trained with about 90% of their input in English, sourced mainly from the internet which leads to bias,” warns Meyer. “Smaller countries are barely present, which poses real issues of sovereignty.”
A collaborative future: SMA systems and the next generation of AI
Looking ahead, our two experts envision a future in which multiple specialised models collaborate seamlessly, offering the speed, accuracy and efficiency that rival today’s best large language models.
“The future lies in systems that match the speed and accuracy of top LLMs like ChatGPT,” says Meyer. “I believe we’ll see this with SMA systems. The idea is to have several models working together – one agent analyses the request and breaks it down, other dispatches tasks to specialised agents, and so forth. In effect, a series of small, expert models collaborating.”
Benesse agrees: “Recently, the trend has been towards ever-larger models with enormous computational costs. But we needn’t rely on a single generalist model to handle every request. SMA systems mimic a company’s structure, with specialised teams – strategy, engineering, productisation, communications – to reduce resource use. It’s one of the best current options for more explainability and frugality.”