4 Ways to Fix Bad Data and Improve Your AI
As marketing analytics rapidly evolves into an AI-driven field, a major challenge threatens to derail progress: bad data. Although AI excels at transforming large amounts of information into actionable insights, its effectiveness depends on well-planned and well-managed data sets.
Bad data leads to poor predictions, bias, misinformation and unexpected results. To address these risks, businesses are investing heavily in data cleansing, validation and governance – a critical, time-consuming and complex process.
For analysts, it is essential to prioritize better metrics and understanding the business context behind their data. That’s why analysts must lead efforts to optimize data for AI. Here are four strategies for extracting insights from broken data sets while improving data hygiene and planning.
1. Identify supporting data
It is often possible to use other data sources to support the metrics you are trying to measure. For example, I worked with a retailer who claimed their inventory data was unreliable – a major problem. However, point-of-sale (POS) data identified fast-moving SKUs that suddenly showed no sales.
Although the inventory system showed low inventory levels (but not depletion), sales trends clearly indicated an inventory problem affecting revenue. Using this information, we adjusted replenishment thresholds and triggers to keep high-demand merchandise in stock, mitigating revenue loss.
Dig Deeper: How to make sure your data is AI-ready
2. Investigate the “bad reputation”
Sometimes a dataset gains a bad reputation due to “noisy outliers” that receive disproportionate attention. Although noticeable, these errors often represent only a small proportion of otherwise accurate data.
For example, I worked on household insurance policy data for a personal insurer. There have been cases where policies were wrongly grouped under the same household or separated incorrectly. We found that several issues, such as incorrect or repeated addresses and policies sold by different agents, caused most errors. We cleaned the dataset by writing patch code, turning it into a reliable resource.
3. Differentiate between zero and null
Missing data can hamper decision making. So, the first step is to determine if the values are actually missing or if they are just recorded as zero. Understanding the logic behind how data is generated is crucial, because “no activity” (zero) is not the same as “missing information” (null). If the data is truly missing, you have two options.
Are there proxy values or variables to estimate missing values? This may involve experimenting with combined variables. Can the business question still be answered using the available data?
In most cases, missing data is more of a hindrance than an insurmountable obstacle.
Dig Deeper: The hierarchy of data analysis: the place of generative AI
4. Use random errors to your advantage
Sometimes bad data takes too long to fix or is downright unrepairable. However, if the errors are random, they may cancel each other out. This makes it possible to measure significant differences between groups or periods.
For example, my team worked with web traffic data from two recently merged brands. Each brand had its own analytics platform, which provided slightly different metrics and faced issues with visitor identification.
Since there was no reason to believe that one brand’s platform was significantly more flawed than the other, we assumed the errors were random. The segmentation factors were similar across both brands, which allowed us to effectively analyze segment-level differences. This combined segment-focused strategy saved the company millions.
Making the Most of Bad Data in an AI-Driven World
These strategies are not exhaustive, as each data challenge is unique. However, too often, companies abandon faulty data sets prematurely, focusing only on the lengthy process of correcting the data. These interim strategies demonstrate how valuable insights can still be extracted from imperfect data sets.
At the same time, businesses should not feel constrained by their current data. In many cases, generating new, more relevant data can be done quickly, especially in digital marketing. By using corroborating data, resolving reputation issues, distinguishing between zeros and nulls, and strategically using random errors, analysts can unlock the value of flawed data sets and help build a strong foundation for the success of AI.
Dig Deeper: The AI-powered path to smarter marketing
Contributing authors are invited to create content for MarTech and are chosen for their expertise and contribution to the martech community. Our contributors work under the supervision of the writing and contributions are checked for quality and relevance to our readers. The opinions they express are their own.