Data Preparation: What Are Your Options?
  Tomer Shiran
Founder & CEO


Thursday, April 6, 2017
08:30 AM - 09:15 AM

Level: Intermediate

Business analysts and data scientists often spend 80-90% of their time getting and preparing data rather than analyzing data or creating machine learning models. Fortunately, there have been significant investments in technologies that can streamline data access and preparation. In this talk we explore a variety of open source and commercial tools that appeal to a broad range of users, ranging from business users to data scientists and software developers. First, we discuss a number of open source projects that enable data scientists and developers to prepare data, including Apache Spark, Pandas (Python) and dplyr (R). We explain how to connect these tools to different data sources and how to transform the data. We also explain how users can efficiently construct and maintain complex data preparation pipelines with these tools. Second, we discuss a number of commercial tools that enable non-technical users, including business users and analysts, to prepare data prior to analysis. Throughout this talk, we utilize real-world datasets such as NYC Taxi (TLC) Trip data and NOAA weather data and demonstrate how this data can be prepared using programmatic and visual data preparation methods to support BI and machine learning use cases. At the end of the talk, the attendees will be familiar with numerous methods and tools for data preparation and understand the pros and cons of each method. In addition, the complete examples (step-by-step walkthroughs and code) demonstrated in the talk will be available for the audience to try on their own.

Tomer Shiran is the CEO and co-founder of Dremio. Prior to Dremio, he was VP of Product at MapR, where he was responsible for product strategy, roadmap and new feature development. As a member of the executive team, Tomer helped grow the company from five employees to over 300 employees and 700 enterprise customers. Prior to MapR, Tomer held numerous product management and engineering positions at Microsoft and IBM Research. He holds an MS in Electrical and Computer Engineering from Carnegie Mellon University and a BS in Computer Science from Technion - Israel Institute of Technology, as well as five U.S. patents.

