Almost all organizations (in fact I’ve not yet seen one who hasn’t) have problems with their data. And those problems really become apparent when collecting and aggregating them for reporting and analytics purposes. Which company doesn’t have a datawarehouse full of inconsistent and undefined data? And which organization isn’t trying to get cool predictive analytics models to work, only to find out that they produce unreliable outcomes or take for ever to be validated due to data quality issues? But the fact that data quality issues become apparent in the reporting and analytics capabilities of an organization doesn’t mean that these issues are best solved there. And worse even, all those BI and Big data tools claim to have excellent tools for cleansing or enhancing data, so why not go for it? Because it doesn’t makes the problem go away. Fix data quality issues at the root, the source where the data is created and you will work towards the end of the tunnel of misery. This is what data architecture is truly about. Unfortunately, I see too many organizations fall into the trap of focusing with their data architecture activities on building datalake like platforms with cool new tools, only to stay stuck in not getting anything into production.
Categories