Finding Quality in the Data Lake

With the advent of Big Data and Data Lakes, we can gain greater insight and make better business decisions. Yet, we spend a disproportionate amount of time obtaining and preparing data, and trust in the quality of that data often remains low. We have a set of standard data quality measurements built for known, canonical data models. But now we face known and unknown sources with highly varied formats and very disparate meanings and uses. When we add the human factor, the assumptions built into capturing or producing the data, new challenges emerge. How do we ensure trust not only in the original data, but the subsequent data we’re acting upon? To address this, we must consider how and where we evaluate data quality to ensure that the Big Data we use is not only relevant and fit for purpose, but data we can trust and act confidently upon.

Key Points:

Big Data drives better Business Decisions
But time to prepare is High, and trust is Low
What does Data Quality for Big Data mean and where should it be applied?
Four Key Steps to achieve Trust
New measurements of Data Quality must be considered
Application of Data Quality must include Elective and Selective approaches

Harald Smith is Director of Product Management at Trillium Software and co-author of "Patterns of Information Management" published by IBM Press. Harald has a diverse career specializing in information quality, integration, and governance products with a focus on accelerating customer value and delivering innovative solutions. He has written extensively on the integration, management, and use of information and has been issued four patents in this field.