A good data model is supposed to represent something in the real world.
However, many data models are based on data exclusively from the internet.
Just image the downstream consequences of that.
For example, a data model based upon social media user-generated content will be full of:
Bias.
Miss-truths and half-truths.
Opinions (some of them dangerous).
Invalidate data with no sources, no peer review...
If a data model is built off bad data, and then that data is used to train an AI, that AI will contain the same bias, miss-truths, dangerous opinions etc.
Getting clean data to drive good decisions, be they human or AI, is becoming increasingly difficult.
We are swamped in data, but the signal-to-noise ratio is low.
The garbage in/garbage out problem has never been greater, and thanks to AI, the downstream consequences have never been higher.
The business opportunity here is great however: a marketplace for high-quality training data models for AI is about to emerge, and it will be extremely lucrative.
Ironically, a return to offline data such as peer-reviewed papers and books may the solution.
Such legacy silos of data will become the new gold rush.
In such a market place, the quality of an AI will be judged by the quality of it's training data.
What I am working on this week:
Designing an internet search indexer for the Alpha Framework.
Media I am enjoying this week:
Diaspora by Greg Egan.
Download
File details: 7.6 MB MP3, 5 mins 16 secs duration.
Five.Today is a highly-secure personal productivity application designed to help you to manage your priorities more effectively, by focusing on your five most important tasks you need to achieve each day.
Our goal is to help you to keep track of all your tasks, notes and journals in one beautifully simple place, which is highly secure via end-to-end encryption. Visit the URL Five.Today to sign up for free!