Use this button to switch between dark and light mode.

What is Unstructured Data?

January 17, 2023 (6 min read)
Unstructured data can provide great insight.

Futurist and data technology expert Bernard Marr notes that as much of 90% of data being generated daily is unstructured and the volume is growing at a rate of 55 to 65% a year. It represents mountains of data going largely unused, until recently.

But as organizations expand data analysis, unstructured data can contribute valuable context, filling in gaps left from strictly quantitative data analysis.

In this article, we’ll break down everything you need to know about unstructured data, from what it is to how to get meaning out of it. We’ll also show you how Nexis® Data as a Service is your one-stop-shop for all of your unstructured data needs. Let’s dive in.

What are examples of unstructured data?

If you’ve ever used Excel spreadsheets (and who hasn’t), then you’re already well-acquainted with structured data. Whether it’s company financials or personal recipes, the structure of a spreadsheet makes it easier to parse the data to uncover earnings trends or every recipe that uses broccoli.

Unstructured data is often overlooked, says Dave Hanson, resident expert on all things data and General Manager of the Data as a Service and Entity Due Diligence and Monitoring solutions at LexisNexis®.

He explains the value, noting “structured, quantitative data tells you the ‘what’. Unstructured data, on the other hand, offers critical context. It answers the ‘why’ fueling quantitative data.”  

Unstructured data is created both internally and externally. Internally, unstructured data comes in the form of text from emails, invoices, corporate communications, and other text-based content generated while doing business.

Externally, unstructured data includes photos, as well as text-based sources like news, social media, press releases, and more.

Four features of unstructured data

Of course, before you weave unstructured data into your processes or AI-powered applications, you need to understand the pitfalls and promise of these datasets.

1.      Unstructured data is predominantly textual

As mentioned above, unstructured data often comes in the form of articles, social media, emails, or other text-based communication. There may be quantitative data reported in a news article, but those numbers are distributed throughout the text, so they aren’t as easy to extract and analyze as a spreadsheet. The volume of news being generated daily—in print, on the web, over airwaves—can also be pretty intimidating. But locked into all of that text are details that can help you make sense of quantitative data. 

2.      Unstructured data is unwieldy

In its raw form, unstructured data can be difficult to process, and the volume alone often poses a problem. If you’re trying to glean insight it can be like finding a particular needle in acres of haystacks. That’s why you need to look at how a data provider enhances unstructured data to make it more user-friendly. (More on that later.)

3.      Unstructured data is qualitative

Quantitative data is all about the numbers, and it can answer questions related to numbers. How many units sold last month? How does that compare to the same month the previous year? It’s data that can be easily validated and verified.

Unstructured data, however, is qualitative in nature. It describes or explains, capturing events, emotions, and perceptions.

Hanson notes, “Qualitative data is less about figures and more about text and contextual-based information, but with that comes a huge amount of potential to tell the story around what is happening.”

4.      Unstructured data is contextual

This is where the real value of unstructured data comes into play. Take the example of units sold. Analyzing news data from the same time can help to explain why sales were up or down—for example if a certain quarter has events that drive sales or make it slower, like the holiday season.

Data without context can be misleading. Data informed by contextual insights enables better decisions.

MORE: 8 Ways to Use Alternative Data to Improve Your Financial and Data Monitoring

Unlocking the value of unstructured data

When you decide to integrate unstructured, third-party data in your processes and applications, you should consider three crucial factors.

1.      Data curation can make—or break—the perceived value of data analytics

Fake news and the loss of trust in media go hand in hand. In the third quarter of 2020, for example, there were 1.8 billion fake news engagements—and the pervasiveness of fake news continues still.

As a result, the 2023 Edelman Trust Barometer reveals that trust in media is still lagging behind trust in business, non-governmental organizations, and governments (which narrowly escaped last place by a percentage point.)

With 50% of people mistrusting media, it’s critical that you curate data from reputable and varied sources. When sourcing unstructured datasets, look for well-provenanced data that captures diverse viewpoints so you can deliver insights that are relevant and unbiased.

2.      Data enrichments can make unstructured data more useable

The volume and unwieldy nature of unstructured data demands a solution. After all, you can have the best datasets at your fingertips, but piles of great unstructured data from reputable sources are still just piles of data.

Enrichment is effectively data added to data, bringing structure to unstructured data. Data is simply not very usable, especially at volume, unless it has been enriched,” says Hanson.

How do enrichments help? They makes huge datasets more searchable and allow you to slice and dice the data to uncover more insights. Say you’re searching for information about Apple. Enrichments allow you to easily exclude mentions not related to the entity. They effectively filter out unrelated mentions like candy apple red or apple recipes that might otherwise slow or skew your analysis.

Case in point: A financial insights company wanted to add topical news feeds that analysts could use to inform reports and data it curates for its own customers. When comparing third-party unstructured data providers, the enrichments proved to be a deciding factor in integrating data from Nexis Data as a Service or a similar provider. Enrichments vary by dataset but general include:

  • Article subject/topic
  • Entities mentioned
  • Geographical coordinates
  • Language
  • Company data, e.g., revenue, stock ticker
  • Sentiment

These enrichments, along with relevancy scores, enable analysts to make targeted data calls and refine datasets down to what really matters. “Particularly in analytics or data integrations, enrichments allow analysts to draw much more insight, programmatically, across a high volume of data,” says Hanson. 

Ultimately, the financial insights company chose Nexis Data as a Service because the enrichments—especially entities mentioned and tags related to mergers and acquisitions—led to a more complete results set than the competitor. That’s why Dave Hanson says, “The volume and quality of our enrichments are the magic that brings unwieldy data to life.”

3.      Flexible data delivery options help you use unstructured data as needed

When choosing a data provider, make sure they can deliver data in different ways. While a Search & Retrieve data API may be ideal for ongoing trend analysis or scheduled data calls for PEPs and sanctions data, deep historical analysis may require bulk delivery of decades of news data.

In other instances, a Flat File may be the best option. By partnering with a data provider that offers a wide range of delivery options, you can build a long-term relationship that can adapt as your data needs evolve.

MORE: The Endless Possibilities of Data as a Service

Access unparalleled unstructured API data

LexisNexis has long been a go-to source of news, company and legal information. As a result, we’ve spent decades honing, expanding, and working to improve data we aggregate so it offers optimal flexibility and efficiency—whether it’s within our own platforms for business, academic and legal research or as unstructured datasets to power your own tools and applications.

  • All of the data we aggregate goes through an enrichment process, including:
  • 100K+ News Sources
  • 50K Legal intelligence sources
  • 262M+ company financial stability data
  • 150+ Biographical sources
  • 240 Countries covering PEPs, Sanctions, Watchlists, State-Owned Enterprises
  • 100M Beneficial Ownership Records
  • 400M+ Company intelligence records

Because we aggregate from global sources in 200 countries and across 37 languages, you can be confident that you’re capturing a multi-dimensional perspective. And with a 45+ year news archive and fresh data being ingested and enriched every day, Nexis® Data+ is a one-stop-shop for a wide variety of quality, enriched data suited to use cases spanning predictive modeling and risk management, trend or investment analysis, and other data-driven projects or processes.

Ready to explore the options? Learn more about unstructured data available with Nexis® Data+.