Dealing with Drift: Building an Enterprise Data Lake
Share this Session:
  Pat Patterson   Pat Patterson
Community Champion
StreamSets
 
  Michael Gay   Michael Gay
Lead Technical Architect - Big Data Engineering
Cox Automotive
 
  Nathan Swetye   Nathan Swetye
Sr. Manager, BI Development
Cox Automotive
 


 

Thursday, April 6, 2017
08:30 AM - 09:15 AM

Level:  Intermediate


Data drift, the gradual morphing of data structure and semantics, is a fact of life in enterprise IT. New requirements force schema changes, the meaning of database columns changes over time, and infrastructure upgrades add new fields to log files. Left unchecked, drift in data sources can cause applications and dataflows to fail, with costly downtime and, in the worst case, corruption in downstream data stores.

Cox Automotive comprises more than 25 companies dealing with different aspects of the car ownership lifecycle, with data as the common language they all share. The challenge for Cox was to create an efficient engine for the timely and trustworthy ingest of data capability for an unknown but large number of data assets from practically any source. Discover how their big data engineering team overcame data drift and are now populating a data lake, allowing analysts easy access to data from their subsidiary companies and producing new data assets unique to the industry.

Attendees will learn how Cox Automotive:

  • Took on the challenge of ingesting data at enterprise scale and the initial efficiency and data consistency struggles they faced
  • Created a self-service data exchange for their companies based on an architecture that decoupled data acquisition from ingestion
  • Reduced data availability from weeks to hours and development time by 90%


Pat Patterson has been working with Internet technologies since 1997, building software and working with communities at Sun Microsystems, Huawei, Salesforce and StreamSets. At Sun, Pat was the community lead for the OpenSSO open source project, while at Huawei he developed cloud storage infrastructure software. Part of the developer evangelism team at Salesforce, Pat focused on identity, integration and the Internet of Things. Now community champion at StreamSets, Pat is responsible for the care and feeding of StreamSets' open source community.

Michael Gay has been involved in the application development and Business Intelligence (BI) space for over 10 years. He previously held positions at Northridge Systems, TMX Finance and Amplify (formally Wireless Generation), where he worked in both application development and data warehousing. Michael is currently the lead technical architect of Big Data Engineering on the Enterprise Data Platform team at Cox Automotive. Michael has spent over five years in the Hadoop Ecosystem building data lakes, reporting applications and streaming applications, both on premise and in the cloud. He is currently designing and building the next generation of Cox Automotive’s data lake and data ingestion platform.

Nathan Swetye has been involved in web and application development since 1996, progressing through roles at HomeCom Communications, Intellimedia Commerce Inc., AutoTrader.com, and now Cox Automotive. Starting out as a Front-End Developer/Designer, Nathan later switched to middle-tier Java development and then, for the past decade, engineering and project leadership. Nathan is now the product owner and tech lead for Cox Automotive's data lake ingestion tools.


   
Close Window