any organizations are saddled with poor data and don’t realize they aren’t doing anything to fix the problem. Better data equates to better organizational value in the kind of information that can be displayed in reports and shared with employees and customers.
In this article, we’ll look at how organizations can decrease risks from bad data by increasing information integrity.
The Importance of Data Inputs, Logic, and Reporting
You’re probably familiar with the term “garbage in, garbage out.” While that doesn’t exactly hold in an enterprise environment, where some controls are in place, it does have relevance. Most enterprises have some form of data cleansings and structure in place but often just the minimum needed. The end result is that vast amounts of data remain unstructured.
According to IDC, 80% of the world’s data is unstructured, as reported by Data Management Solutions Review. The problem doesn’t seem to be getting any better. Gartner predicts that by 2021, more than 80% of data in organizations will be dark data. As defined by Gartner, dark data is “the information assets organizations collect, process and store during regular business activities, but generally, fail to use for other purposes.”
Enterprises are keeping large amounts of data around as a “just in case.” Such data remains unstructured and continues to grow. Any attempt to structure it in the future will likely be a monumental task.
Complexity will only increase as the amount of data increases.
There is no value to extract in unstructured data, at least not without great effort. Unstructured data must be converted into a structured format before it can be analyzed and value extracted from it. Many enterprises are utilizing only small amounts of their data. This smaller amount of data is where reports come from. If reports are thought of as visual representations of value from data, then we can assume that little value is being pulled from data within enterprises.
Unstructured data is difficult to manage but companies can try extracting value from it by using tools such as Splunk. While not a complete solution, it can help reduce the effort involved.
Inaccurate Data and Misinformation
Trying to pull reports from unstructured data can lead to inaccurate information, due to the tedious and error-prone process involved. Inaccurate data can also manifest itself in other ways. Data going into a system can be inaccurate and no form of restructuring can automatically fix that.
At least after inaccurate data has entered into the system, fixing it is a matter of finding those specific occurrences and fixing them individually. To ensure inaccurate data doesn’t get into the system, methods for gathering data must be reliable. However, this task is not always so straightforward.
Survey questions can lead to biased answers. Employees can misinterpret the kind of data they should input into a form. The wrong kind of data can be collected from an automated collection method. Validating and thorough testing of input methods are two of the best ways to ensure the right data enters the system.
End-User Data Confidence Is Critical
Bad data practices internally can carry over to external surfaces of the company, exposing the organization’s poor data quality to the world. The quality of data is especially important to end-users. End users who receive data-laden with inaccuracies and invalid sources will begin to mistrust such data. This can spell the beginning of the end for companies who are unwilling to put in the effort needed to change data practices.
Once end-users begin to stop trusting a company’s data, years of built-up trust can crumble in a matter of hours. Building up the same level of trust can be a lengthy process, assuming it is even possible.
Putting in the effort early and seeing it through pays off not only internally but externally as well.
Defining the Optimal Data Model
There are many data models to suit every data need. Discussing them is beyond the scope of this article. However, we can talk about this topic in general.
The type of data model an organization chooses is dependent on the data being generated, what the company will do with the data, budget, skills of those managing the data, exposure of data to the public (i.e., security), and more. It’s certainly not a quick discussion but any time required in making a decision will be well worth it. Trying to change the data model at some later time will be very costly, assuming it can even be done.
Establish Data Stewardship
Employees need to be involved with quality data input. Employee training and continually speaking to quality data will help reduce the amount of bad data going into the system. Championing quality data requires a group of people who can talk on the subject with authority and show why quality data is important to the company.
If employees aren’t on board, all the systems in place for structuring data will be for not. The best systems in the world will still struggle to fix inaccuracies and manage incorrect data.
Better data input will lead to better data output in reports and will allow the company to extract higher value from that data.
Creating additional value from data starts with better input of data. Employees who understand how to property use systems to input data go a long way to improving the value of that data.
The next larger step is the structuring of captured data. Using a system that dumps data into a big pile without any structure basically means locking up that data. The effort needed to extract any value from such chaos is simply too great. Structuring of data through the right data model fixes this problem and opens up a whole new door to company insights.