Six Criteria for Sourcing Credible Data to Power Better AI Initiatives

November 18, 2024 (4 min read)

Be careful to source data from credible sources when powering AI.

From machine learning to generative AI, the latest developments in AI technology offer a major competitive advantage to companies who can best apply them to their business’ challenges. But those benefits will only be realized if a firm’s AI initiatives are founded on credible and accurate data.

In this blog, we explore how trusted data and technology can make your company more innovative.

How to ensure you’re using credible data for better AI initiatives

The best way for any company to use AI for innovation and business transformation is to bring in the highest quality data whose origins they know and trust. It is usually more cost-effective to use a third-party provider of data and technology which can demonstrate strong relationships with reliable publishers that cite each original source. Such providers also keep their data sets up-to-date and add enrichments.

When selecting a provider, companies should seek credible and reliable data for AI which meet these six non-negotiable criteria:

Reputable sources.
Clear provenance with a link back to the original source.
A date stamp of when the article or content was published and (if possible) when it was called on by the API.
Data whose timespans range from near real-time updates to a historic archive of trusted sources.
Enrichments to help users quickly and easily access the information most relevant to their needs.
Evidence of how that data was collected in compliance with legal, privacy, and ethical standards.

If a company is transparent about its use of technology and its commitment to ethical data collection and usage, underpinned by strong governance to oversee the outputs from AI tools, this will help to gain consumers’ trust. The LexisNexis Future of Work 2024 report revealed that over 70% of executives believe that using trusted and accurate data sources could improve the level of trust in the outputs from generative AI tools.

MORE: Powering AI innovation and initiatives with LexisNexis Data

The problem of (and solution to) unreliable data

Some early adopters of AI and big data technologies have gained a lead over their rivals by finding new innovations to improve their products and services. But many firms have also found onboarding technology too quickly without prioritizing data quality has led to legal, financial, reputational, and strategic costs.

These costs range from losing the confidence of customers to actual litigation and fines. Powering AI with high-quality data can overcome these risks, and improve companies’ technological approach in the following areas:

1. Result accuracy

The magic of AI is its ability to form insights and predictions from high volumes of data. However, if that data is inaccurate, unprovenanced, biased, outdated, or partial, it will negatively mirror these issues in AI’s outputs. This could result in time and money wasted on “innovations” that are not backed by trusted data and evidence, leading to poor decision-making across the company. Prioritizing accurate and reliable data will therefore lead to better and more accurate outputs from AI.

2. Improved generative AI responses

Generative AI solutions are regarded as highly promising for companies, and the market for data used in these tools alone is already valued at $2.5 billion, according to Business Research Insights. But if generative AI tools are powered by unreliable data, it creates a significant risk of AI hallucinations, where a model generates a response that is not grounded in its training data or given prompt.

A Retrieval-Augmented Generation approach using trusted data from original sources can help companies to overcome this problem. It can ensure that every response from a tool is grounded in these quality sources and cites them so users can verify for themselves that the output is not a hallucination.

3. Clearly-demonstrated compliance

Reliable data can help a company to demonstrate its compliance to its customers and, crucially, regulators. Several major technology companies have faced, or are currently facing, lawsuits for allegedly inappropriately scraping individuals’ data from the internet without proper consent, especially from social media accounts.

Others have faced regulatory scrutiny over alleged breaches involving copyright, intellectual property, data protection and cybersecurity regulations. The best way to overcome this risk is to power AI tools with data from trusted providers who clearly comply with all relevant regulations – both in the way they collect their data, and in the permissions from publishers for how that data can be used.

MORE: How to overcome the top 8 challenges in generative AI

LexisNexis® helps companies to innovate with credible data to power big data & AI initiatives

LexisNexis provides credible data to help your organization realize the innovative potential of AI. As an established data provider for over 50 years, LexisNexis has extensive, long-standing – and in some cases, exclusive – content licensing agreements with publishers worldwide. We supply data to enable you to advance your goals while recognizing and respecting the intellectual property rights of our licensed partners.

Our API solution, Nexis® Data+, enables you to integrate our enriched data into your existing tools and platforms. This provides an outstanding foundation for carrying out analysis and AI initiatives and supports an API-first approach to your projects and products. Nexis Data+ offers direct access to our extensive data universe, encompassing news, legal, company, financial, biographical sources, ESG ratings, academic journals, compliance data, and more.

Download our free ebook, Harnessing Data for AI Innovation, to learn more about the how your company can exploit AI’s opportunities and manage its risks with high-quality data.

Tags: