Have summaries of our latest blogs delivered to your inbox, so you can stay up to date on the topics and current events that matter to your business.
Generative AI’s potential for companies is well-known, but the technology can create new risks if it is not powered by original and trustworthy data sources. In this blog, we explore those risks;...
Generative AI is widely predicted to transform almost every industry and use case, and companies spent more than $20 billion on the technology last year. But it also exposes these firms to new risks if...
The global economy in 2025 will be characterized by fluctuations in inflation, material costs and growth rates across different jurisdictions. Only organizations that can identify and manage these risks...
New human rights due diligence legislation. Emerging AI regulations. Stricter financial crime directives. These developments and more should leave organizations in no doubt that their reporting requirements...
Large Language Models and generative AI tools have transformed the way organizations bring order to the vast amount of online and offline data available. AI can narrow down this data universe to a concise...
Generative AI’s potential for companies is well-known, but the technology can create new risks if it is not powered by original and trustworthy data sources. In this blog, we explore those risks; highlight best practices around pulling data for generative AI using a Retrieval Augmented Generation (RAG) technique; and suggest the key questions to ask your data provider for a trustworthy and effective approach.
87% of companies plan to adopt generative AI technology (if they haven’t already), according to the LexisNexis® Future of Work Report 2024. But, in recent years, far too many corporate AI initiatives have ended in failure. A common cause of this is poor quality data – as the saying goes, “garbage in, garbage out”. The outputs from generative AI tools will only be as accurate and relevant as the data powering them.
The problem typically lies in companies inputting low-quality data from third parties into their generative AI models. This might be a third-party generative AI tool which a company uses to support its work, or a third-party data aggregator from which it pulls content to power its own generative AI solution. If these providers cannot clearly demonstrate where and how they pulled their data, it poses five main risks:
MORE: Exploring credible data for AI
Retrieval Augmented Generation (RAG) is a technique to enhance a generative AI tool to mitigate these risks. Traditionally, a tool learns continuously from its original training data and its prompts and responses with users. But Retrieval Augmented Generation forces the model to pull information from an extra layer of data which supersedes the previously learned data. This data should be credible, authoritative and pulled directly from original sources, such as the data licensed for generative AI use by LexisNexis®. The generative AI model is therefore required to generate every answer by pulling from this data as context and cite the original source(s) used in each response.
Retrieval Augmented Generation offers myriad benefits, for example:
Unlocking the benefits of a RAG approach to generative AI requires access to trustworthy data which is optimized for use in this specific technology. The LexisNexis® Future of Work Report 2024 found that 9/10 of professionals’ main consideration for choosing a generative AI tool is the quality and accuracy of its output. While 7/10 said trusted, accurate data sources are the key to fostering trust in their use of generative AI. So how can companies pull this contextual data for their generative AI models using a RAG approach from original sources?
MORE: The A to Z of understanding AI and big data
Pulling from original sources to power generative AI initiatives involves going to individual, reliable publishers and requesting to use their data. Companies operating worldwide may need to do this for sources across multiple jurisdictions and languages. This would be extremely time-consuming, both to negotiate acquiring the data and to ensure compliance with differing regulations over time.
Therefore, it is far more efficient to outsource the acquisition of data sources to a specialist third-party provider. Depending on your budget, there are two approaches you might take:
Whichever approach you take, it is critical that the third-party provider has ensured each data source it uses is licensed and approved for the specific use of generative AI and meets all relevant regulations and ethical standards around data protection and privacy. Your company will be held accountable for any failures in this respect. Questions to ask a potential provider include:
MORE: AI for business research unlocking new insights and opportunities
Applying Retrieval Augmented Generation into your generative AI development is only effective if the contextual data it brings in is accurate, trustworthy, and approved for use in generative AI tools. LexisNexis provides licensed content and optimized technology to support your generative AI and RAG ambitions:
Download our new toolkit to learn more about the how your company can realize the potential of AI while staying ahead of evolving regulations.