In the Big Data era, data has become digitised and we now have access to larger data sets. As technology has advanced, we have moved from an age that focused solely on data volume, velocity and variety to one that is now focused on empowerment;
the fusion of big data with AI and machine learning has democratised data access and spearheaded the growth in augmented analytics. This has provided many positives such as allowing predictive analytics to be completed at unprecedented speed and scale, allowing
business users to leverage advanced analytics without deep technical expertise, and the like.
But, more data does not equal better data. As companies collect vast volumes of data, the signal-to-noise ratio drops. It is easier to find misleading patterns, false correlations, or to overfit models (where by machine learning models
learn the training data too well – including its noise, outliers, and random fluctuations).
Democratised data access and the rise of augmented analytics has also led to a pressure for companies to find and deliver actionable insights at speed. This can lead to:
- Confirmation bias, e.g., post rationalising unproven assumptions
- Cherry-picking of successful metrics
- Missing key trends, data gems and data relationships
- Oversimplified dashboards that strip away key nuances
What is coming evident is that automated tools are amplifying this problem. If flawed input data or assumptions go unchecked, entire recommendations and analytical models will easily go off course.
This leads to data distortion, where data or insight based on that data, deviates from its true or most accurate representation.
Why is this important?
Data distortion, and the resulting insight distortion, is a serious and growing challenge.
It can lead to misleading conclusions, poor decision making, unoptimised supplier relationships, unhappy customers, poor strategic execution and potentially even compliance breaches; and any one of these can hit your bottom line and potentially
impact your risk exposure (e.g., operational risk, reputational risk, strategic risk, credit risk).
Even when data is accurate, insights can still be distorted; leading to missed opportunities and misleading conclusions.
What can you do?
So how can you protect yourself from this hidden risk?
Here are five practical steps to reduce data distortion and safeguard your decision-making:
- Audit your data: Critically scrutinise and validate both the quality of your data and the sources of your data. You should also frequently take a step back and reassess the assumptions underpinning your models and metrics
- Implement bias detection: Identify skewed or incomplete data sets. Use both human insight and automated tools to detect bias in your data; this will help ensure completeness and balance
- Cross-verify insights: Where possible, move away from relying on a single source or model; triangulate data and insights across multiple sources and systems
- Embed Governance: Make data integrity and interpretation part of your risk management framework. Build policies for data collection and model validation, review and refresh these regularly and run periodic audits
- Educate teams: Make your people the first line of defence. Ensure your data and strategy teams, alongside decision-makers, understand the limitations of data and analytics tools
The bottom line
Distorted data – and the resulting insight distortion – pose risks you cannot afford to ignore. Get it wrong, and the consequences for your business can be significant; strategically, operationally, and financially.