Instant and Repeatable Data Platforms

Configuring a data platform and data science environment can be a tedious, error-prone process including development, continuous integration, QA, staging and production, and often has to be configured from scratch. By combining cloud platforms such as AWS or Azure with Terraform and Ansible, we can create a repeatable data science infrastructure.

In this talk, we'll discuss our "push button" infrastructure tool and how attendees can use it in their own projects to create a cloud-agnostic environment that spins up quickly and is easy to configure as required.

We will cover:

Use cases, such as the ability to bring up the same cluster repeatedly, or disaster recovery
How to parameterize your cloud environment
Creating a data lab for the data scientist, with all the tools they require for their exploration
The development and release process, including integration testing
How to model costs in real-time to analyze price and desired performance

Heather Nelson is a Senior Solution Architect at Silicon Valley Data Science. A problem solver by nature, Heather is passionate about helping organizations leverage data to drive competitive advantage. She draws from a diverse background in business and technology consulting to find the best solutions for her clients’ toughest data problems.

Mark Mims has extensive experience architecting and implementing data science solutions across a variety of industries. His passion is Data Plumbing, where Data Science meets the real world of DevOps and Infrastructure Engineering.