S6: Designing, Managing and Operating a Distributed Data Lake

For many companies, the number of new data sources that a business now wants to analyse is rapidly increasing. In addition, data integration is now happening almost everywhere in the organisation, whether it be for master data management, data warehousing, building data marts, data science projects, real-time analytics or self-service BI. The result of all this activity is that the cost of data integration is rising rapidly, silos are emerging and the complexity in terms of managing a governing data has the potential to spiral out of control. Therefore many are saying to create a ‘data lake’. Put all the data in one place where you can clean and integrate it for any purpose. But data is being collected in many different locations across the enterprise and in the cloud with much of it too big to move. So how do you manage and govern this environment? How do you govern it and accelerate delivery of trusted data that is ready for business use? This session looks at this problem and proposes a new collaborative information architecture to organise, govern, rapidly process and manage distributed big and small data to provision it to wherever it is needed.

Data integration complexity
The siloed approach to managing and governing data
A new inclusive approach to governing and managing data
Introducing the distributed data reservoir and data refinery
Goals of a data reservoir
How does a data reservoir and data refinery work?
Tasks and services to manage and prepare data
The mission critical importance of an information catalog in a distributed data landscape
Managing multiple data integration tools in a distributed data reservoir and data refinery
The publish and subscribe model for readying information
Mapping new data and insights into your shared business vocabulary
Enabling the dynamic data map: managing metadata in a graph database
Creating an Amazon for Data: ordering trusted data as a service

Mike Ferguson is the Managing Director of Intelligent Business Strategies. An independent IT analyst and consultant, he specialises in BI/Analytics, big data and data management. He has over 35 years of experience with 27 years in BI/Analytics, 36 years in Data Management, 13 years in Smart Business and six years in Big Data Analytics on Hadoop and NoSQL. Mike works at board, senior IT and detailed technical IT levels on a strategy for BI/Analytics, technology selection, enterprise architecture, data strategy, MDM and Big Data. He has spoken at events all over the world and written numerous articles. Formerly a principal and co-founder of Codd and Date Europe Limited, the inventors of the Relational Model, he was also Chief Architect at Teradata on the Teradata DBMS and European Managing Director of Database Associates. He teaches popular master classes in Big Data Fundamentals, Big Data, New Technologies for DW and BI, Operational BI, Data Governance, Master Data Management and Big Data Management.