Why Generative AI’s Potential Requires Quality Data

November 20, 2024 (5 min read)

To get the best results from your AI efforts, make sure you use quality data.

Generative AI is predicted to transform almost every industry, so it is hardly surprising that companies spent more than $20 billion on the technology last year. But the technology alone is not enough to unlock generative AI’s transformative potential. In the latest blog in our ‘Harnessing Data for AI Innovation’ series, we explore how unreliable data can limit the power and potential of generative AI and how credible data and advanced technologies can set firms up for success.

What is Generative AI’s potential for supporting organizations to overcome their challenges?

Generative AI has the potential to transform industries in very significant ways, including:

Significantly enhancing productivity
Streamlining workflows
Supporting customers
Finding more relevant insights from high volumes of data
Influencing how businesses deliver their key products and services

According to the Wall Street Journal, it is “the most buzzed-about new technology for businesses, promising to supercharge productivity while transforming the way white-collar work gets done”. Generative AI is being talked about everywhere from companies such as automotive giant BMW, consultancy Accenture, the government of Portugal, financial services firm Mastercard, and many more.

The excitement around generative AI is well founded, according to The LexisNexis Future of Work Report 2024. This was produced with experts from Harvard University and found that generative AI has already caused a “pivotal shift” in how organizations operate and strategize and that it will go on to shape the future of work altogether. It could make organizations more innovative, more efficient, and even more creative.

Much of the media and corporate focus on generative AI to date has been on the technology itself. Nearly $20 billion was spent on generative AI tools in the last year, and the number will exceed $151 billion by 2027, according to the International Data Corp. But the difficult truth is that if generative AI is not underpinned by high-quality data, this investment could simply be wasted. Credible data is the key ingredient for companies seeking to exploit the opportunities of generative AI.

MORE: Seven steps to a best practice API-first approach

Why data quality will determine the success or failure of generative AI projects

There are three main reasons why generative AI projects that are not founded on quality data will hold back the company’s use of the technology–or even expose it to new risks:

AI hallucinations

Generative AI solutions can sometimes generate responses that will sound plausible but have no basis in fact or the underlying data, which is known as a “hallucination”. A typical cause is the tool learning from outdated or incomplete data as well as from its ongoing interactions with users, which leads to outputs based on ‘made up’ data. The problem is compounded if the tool does not cite the original source(s) for all the information in its response, because this makes it difficult for a company to verify whether a response is a hallucination.

“Garbage in, garbage out”

Among the main advantages of generative AI is its ability to absorb high volumes of data to produce almost instant text responses and insights based on a user’s prompt. Many companies are using it to ‘chat’ to customers in real-time more accurately and efficiently, as well as for research and due diligence. But the technology cannot correct errors in the underlying data. If the AI tool is powered by inaccurate or unreliable data, then these problems will be replicated in the results.

Compliance risks

There have been recent legal cases brought by publishers against generative AI providers for allegedly using their data without permission or payment. Poor quality data risks breaching privacy, confidentiality and intellectual property regulations, which exposes companies to potential legal, financial, reputational and strategic harm.

MORE: The main reason why AI projects fail

Why credible data can uplift a company’s use of generative AI

Generative AI is most effective when it is built on high-quality, reliable and credible data. This data should come from original sources or a third-party provider with clear provenance and method of collecting data. The data must be licensed by its publishers for specific use in a generative AI tool.

An effective way to use data to improve the quality of generative AI results is to use a Retrieval-Augmented Generation (RAG) technique. This approach ensures that the generative AI tool retrieves every response from authoritative, original sources, which supersedes its continuous learning from training data and subsequent prompts and responses. In using high-quality, contextual data, the AI tool can deliver more accurate, trustworthy and relevant responses. Moreover, it will clearly cite the exact sources used in the process.

High-quality, approved data for generative AI will be highly sought after by companies in the coming years. Firms will need to invest more in acquiring credible data for AI to ensure they are maximizing the potential of the technology to provide accurate and relevant outputs and insights with reduced risk of AI hallucinations, inaccuracies and biases. The market for data used in generative AI tools has already grown to $2.5 billion, according to Business Research Insights.

Unlock the power of generative AI with credible and approved data from LexisNexis®

Our extensive news coverage, enriched with robust metadata, is readily available for integration into your generative AI projects. Over the past year, we have worked diligently and transparently with our publishers to secure the rights to use their data with generative AI tools. Our portfolio covers over 20,000 licensed titles, with thousands of sources available for use with generative AI technology. The generative AI-enabled dataset includes content from industry giants like The Associated Press, McClatchy and more.

Our generative AI-approved news data set therefore provides you with licensed news content from credible sources worldwide, including:

Leading international, national, and business news sources
Prominent regional news and business sources
A wide range of news sources such as trade press, national news, government releases, and international organizations
Diverse news sources including local news, corporate press pages, wire services, and political websites
Non-news sources including message boards, consumer magazines, academic journals, and licensed content blogs

Our trusted news data helps your organization streamline its research into relevant topics, trends, and entities, optimize workflows; and ultimately achieve your business goals more efficiently.

Download our free ebook, Harnessing Data for AI Innovation, to learn more about the how your company can exploit AI’s opportunities and manage its risks with high-quality data.

Tags:

Subscribe

Stay up to date on the topics and current events impacting your industry.