Knowledge Overload: Why Entrepreneurs Have to Deal with Knowledge High quality, Not Amount
The epiphany of the technological age of the 2010s was that people are streams of data. Likes, dislikes, family, friends, hobbies, work: everything that makes up a full life is just data waiting to be captured, which now lend themselves to all sorts of sophisticated techniques to maximize the probability of a desired action. In 2024, it’s rare for a company not to make significant efforts to gradually deepen the river of consumer data.
This unquenchable thirst for customer data is driven by the fundamental belief that more data leads to better data models, which can drive efficiency and more revenue. This is, however, false. Not only does more data not always lead to better models, it can actually degrade the power and explainability of the model. The advertising industry suffers from data overload, making us less efficient and forcing us to lose confidence customers to whom we market.
The data snapshot
Even if all external restrictions were lifted and we could bring together all the data sources we wanted, a wise marketer recognizes that we should restrict ourselves for a more fundamental reason: much of our data is highly correlated, making it makes it almost useless.
To understand this, imagine that you are a photographer standing a few steps from a skyscraper. You can’t step back and get the whole building in one image; instead, you take many photos from different positions and angles around the building to stitch them together and create a composite photograph of the entire building.
In this example, each image is a new data source that we add to our model, the complete building reconstruction. As long as each snapshot represents a different part of the building, it’s easy to stitch them together to get a complete view. However, with highly correlated data, our images overlap, depicting the same part of the building multiple times. It is much more difficult to be precise in this case.
No matter how many photos you take, if the information content of each news is low, your model cannot improve.
Think smaller, build smarter
So, if we can’t wait to dig deeper into our data set, and if collecting all available data can weaken our results, how then can we build accurate, explainable, and ethical models to better advertise to our customers?
The answer is to think smaller. Avoid the temptation to build “one big model” and instead build several smaller, specially designed models that work together.
As AI becomes a larger part of the marketing technology stack and terms like “training data” and “fine-tuning” become part of the lingua franca, one that is expected to become just as familiar is “screen selection.” features”. Feature selection sits in that very important but often overlooked space between collecting all the data and starting to train a model with it. This is the name for a set of tools, techniques and heuristics used to better understand the data and its value to the model before training even begins.
Conversion attribution could be the fundamental problem with advertising. The flaws of last-click attribution are well known: creating a good multi-touch attribution model remains an art form that requires a lot of time, knowledge and care. AI can help uncover the full impact of media on sales or other downstream metrics. It is well known that factors beyond just advertising spend must be considered to properly quantify ROI. Overall economic health, brand awareness, local household income, and population density are just some of the data an AI can access to answer this question. In fact, the savvy marketer will want to go even further and examine an individual consumer’s credit card history, their interests revealed by their Internet activity, their age, gender, race, etc. There is no shortage of possible factors that could influence a decision. particular group of consumers to convert.
Feature engineering helps us sort through this overabundant data, selecting exactly what is most important for the task at hand. Well-understood techniques such as principal component analysis and variable importance analysis will quantify how well our data can explain observed sales and classify the contribution of each source. So instead of requiring all of this data from consumers, which can be difficult to acquire and incur overhead, we build an equally powerful model on the selected, most impactful data sources identified during our process. selection of features.
Marketers need to better leverage readily available feature selection tools to use AI in an ethical and stable manner. Consumers are becoming more and more aware of value of their data and require care and transparency in how it is used. Fortunately, feature selection research over the past decade has produced many sophisticated tools beyond covariance matrix inspection and principal component analysis to produce finer, more elegant and more efficient with only the most relevant data. Just as post-model interpretation tools can provide transparency to consumers, feature selection demonstrates the care taken to make responsible use of the data collected.
Data overload may only exist in its current form for a relatively short period of time, and the wave of data availability may already be receding. The use of ad blockers has reached its highest rate ever in 2024, data privacy laws are being passed state by state in the United States, and consumers are increasingly trusting marketers to They use their data responsibly. staggering 60% of consumers think companies are misusing their data. It is therefore more important than ever to resist the trend towards data overload and intelligently use feature selection techniques to create intelligent, responsible and efficient models for our clients.