W7: Practical Big Data with Apache Spark

In this tutorial, you will learn how to tame Big Data with Apache Spark. Spark is the fastest growing Big Data system and provides a solid foundation for processing large volumes of data. We introduce the key concepts of Spark, its architecture, and the development model. We show how to use Spark APIs to ingest data, process the data in parallel on a cluster, and control various aspects of Spark program execution.

Note: If you wish to continue learning about Spark, the material in this tutorial continues on Monday into “Fast Data: Real-Time Big Data with Apache Spark and Beyond.”

Spark Architecture
The Lingua Franca of Big Data
Working with Resilient Distributed Datasets (RDDs)
Developing Spark Applications
Spark SQL and Dataframes
Machine Learning with Spark
Graph Analytics with Spark

Dr. Vladimir Bacvanski has over two decades of engineering experience with mission critical and distributed enterprise systems and data technologies. Vladimir has helped a number of companies including the US Treasury, the Federal Reserve Bank, the US Navy, IBM, Dell, Hewlett Packard, JP Morgan Chase, General Electric, BAE Systems, AMD, and others to select, transition to, and apply new software and data technologies.

Vladimir is published worldwide and is a keynote speaker, session chair, and workshop organizer at leading industry events. As a founder of SciSpike, Vladimir is focusing on Big Data technologies and highly scalable reactive software architectures with node.js and Scala. Vladimir is the author of the O'Reilly course on Big Data and NoSQL.