Analyzing Big Data with Microsoft R Server

Learn how to use Microsoft R Server (MRS) to analyze large datasets using R, one of the most powerful programming languages.

About the Course

The open-source programming language R has for a long time been popular (particularly in academia) for data processing and statistical analysis. Among R's strengths as a programming language are its succinctness and its extensive repository of third party libraries for performing all kinds of analyses. Together, these two features make it possible for a data scientist to very quickly go from raw data to summaries, charts, and even full-blown reports. However, one deficiency with R it is memory-bound. In other words, R needs to load the data in its entirety into memory (like any other object). This is one of the reasons R has been more reluctantly received in industry, where data sizes are usually considerably larger than in academia.

The main component of Microsoft R Server (MRS) is the RevoScaleR package. RevoScaleR is an R library that offers a set of functionalities for processing large datasets without having to load the data all at once in the memory. In addition, RevoScaleR offers a rich set of distributed statistical and machine learning algorithms, which get added to over time. Finally, RevoScaleR also offers a mechanism by which we can take code that we developed locally (such as on a laptop) and deploy it remotely (such as on SQL Server or a Spark cluster, where the underlying infrastructure is very different), with minimal effort. In this course, we will show you how to use MRS to run an analysis on a large dataset and provide some examples of how to deploy it on a Spark cluster or in-database inside SQL Server. Upon completion, you will know how to use R to solve big-data problems.

Additionally, throughout this course students will learn to think like a data scientist by learning about the steps involved in the data science cycle (https://docs.microsoft.com/en-us/azure/machine-learning/data-science-process-overview): getting raw data, examining it and preparing it for analysis and modeling, running various analyses and examining the results, and finally deploying a solution.

Prerequisites

There are a few things you will need in order to properly follow the course materials:

  • A solid understanding of R data structures (vectors, matrices, lists, data frames, environments) is required. For example, students should be able to confidently tell the difference between a list and a data.frame, or what each object suited for and how to subset it.
  • A basic understanding of programming concepts such as control flows, loops, functions and scope is required.
  • A good understanding of data manipulation and data processing in R (e.g. functions such as merge, transform, subset, cbind, rbind, or lapply).
  • Familiarity with third-party packages such as dplyr and ggplot2 is very helpful, as we use them in the course but don't cover them in great depth.
  • Familiarity with how to write and debug R functions is very helpful.
  • Although not required, a basic understanding of modeling and statistics can make some of the course easier to follow.
  • Courses:
  • • DAT204x: Introduction to R for Data Science : https://www.edx.org/course/introduction-r-data-science-microsoft-dat204x-2
  • • DAT209x: Programming in R for Data Science : https://www.edx.org/course/programming-r-data-science-microsoft-dat209x-1

Agenda

Getting started: We have an overview of RevoScaleR and show you how to access it by downloading and installing the Microsoft R Client. We then getting the NYC Taxi data used during the course. Finally, we install the required R packages we will be using throughout the course.
Reading the data: We talk about two different ways that RevoScaleR can handle the data and the trade-offs involved.
Preparing the data: We examine the data and ask how we can clean it and then make it richer and more useful to the analysis. In the process, we learn how to use RevoScaleR to perform data transformations and how third-party packages can be leveraged.
Examining the data: We now examine the data visually and through various summaries to see what does and does not mesh with our understanding of it. We look at sampling as a way to examine outliers.
Visualizing the data: We examine ways of visualizing our results and getting a feel for the data. In the process, we learn how RevoScaleR interacts with other visualization tools.
Clustering example: We look at k-means clustering our first RevoScaleR analytics function and look at how we can improve its performance when the data is large.
Modeling example: We build a few predictive models and show how we can examine the predictions and compare the models. We see how our choice of the model can have performance implications.
Deploying and scaling. We talk about RevoScaleR's write-once-deploy-anywhere philosophy and talk about what we mean by a compute context. We then take this into practice by deploying our code into SQL Server and Spark and talk about architectural differences.

Author

Seth Mottaginhejad

Availability

Instructor Led / Instructor Led Classroom

Duration

2 days

Course Topics

R Language, Microsoft R Server

Intended Audience

Business Analysts, Data Scientists
Data scientists or business analysts with intermediate knowledge of the R programming language, especially around data analysis and modeling.

Course Level

Advanced

Analyzing Big Data with Microsoft R Server

In this online, self-paced edX course takes approximately 4 hours a day for 4 weeks to complete, learn how to use Microsoft R Server (MRS) to analyze large datasets using R, one of the most powerful programming languages.

About the Course

The open-source programming language R has for a long time been popular (particularly in academia) for data processing and statistical analysis. Among R's strengths as a programming language are its succinctness and its extensive repository of third party libraries for performing all kinds of analyses. Together, these two features make it possible for a data scientist to very quickly go from raw data to summaries, charts, and even full-blown reports. However, one deficiency with R it is memory-bound. In other words, R needs to load the data in its entirety into memory (like any other object). This is one of the reasons R has been more reluctantly received in industry, where data sizes are usually considerably larger than in academia.

The main component of Microsoft R Server (MRS) is the RevoScaleR package. RevoScaleR is an R library that offers a set of functionalities for processing large datasets without having to load the data all at once in the memory. In addition, RevoScaleR offers a rich set of distributed statistical and machine learning algorithms, which get added to over time. Finally, RevoScaleR also offers a mechanism by which we can take code that we developed locally (such as on a laptop) and deploy it remotely (such as on SQL Server or a Spark cluster, where the underlying infrastructure is very different), with minimal effort. In this course, we will show you how to use MRS to run an analysis on a large dataset and provide some examples of how to deploy it on a Spark cluster or in-database inside SQL Server. Upon completion, you will know how to use R to solve big-data problems.

Additionally, throughout this course students will learn to think like a data scientist by learning about the steps involved in the data science cycle (https://docs.microsoft.com/en-us/azure/machine-learning/data-science-process-overview): getting raw data, examining it and preparing it for analysis and modeling, running various analyses and examining the results, and finally deploying a solution.

Prerequisites

There are a few things you will need in order to properly follow the course materials:

  • A solid understanding of R data structures (vectors, matrices, lists, data frames, environments) is required. For example, students should be able to confidently tell the difference between a list and a data.frame, or what each object suited for and how to subset it.
  • A basic understanding of programming concepts such as control flows, loops, functions and scope is required.
  • A good understanding of data manipulation and data processing in R (e.g. functions such as merge, transform, subset, cbind, rbind, or lapply).
  • Familiarity with third-party packages such as dplyr and ggplot2 is very helpful, as we use them in the course but don't cover them in great depth.
  • Familiarity with how to write and debug R functions is very helpful.
  • Although not required, a basic understanding of modeling and statistics can make some of the course easier to follow.
  • Courses:
  • • DAT204x: Introduction to R for Data Science : https://www.edx.org/course/introduction-r-data-science-microsoft-dat204x-2
  • • DAT209x: Programming in R for Data Science : https://www.edx.org/course/programming-r-data-science-microsoft-dat209x-1

Agenda

Getting started: We have an overview of RevoScaleR and show you how to access it by downloading and installing the Microsoft R Client.We then getting the NYC Taxi data used during the course. Finally, we install the required R packages we will be using throughout the course.
Reading the data: We talk about two different ways that RevoScaleR can handle the data and the trade-offs involved.
Preparing the data: We examine the data and ask how we can clean it and then make it richer and more useful to the analysis. In the process, we learn how to use RevoScaleR to perform data transformations and how third-party packages can be leveraged.
Examining the data: We now examine the data visually and through various summaries to see what does and does not mesh with our understanding of it. We look at sampling as a way to examine outliers.
Visualizing the data: We examine ways of visualizing our results and getting a feel for the data. In the process, we learn how RevoScaleR interacts with other visualization tools.
Clustering example: We look at k-means clustering our first RevoScaleR analytics function and look at how we can improve its performance when the data is large.
Modeling example: We build a few predictive models and show how we can examine the predictions and compare the models. We see how our choice of the model can have performance implications.
Deploying and scaling. We talk about RevoScaleR's write-once-deploy-anywhere philosophy and talk about what we mean by a compute context. We then take this into practice by deploying our code into SQL Server and Spark and talk about architectural differences.

Author

Seth Mottaghinejad

Availability

Self Paced / MOOC

Duration

other

Course Topics

R Language, Microsoft R Server

Intended Audience

Business Analysts, Data Scientists
Data scientists or business analysts with intermediate knowledge of the R programming language, especially around data analysis and modeling.

Course Level

Advanced

Analyzing Data with SQL R Services

In this course we learn the two ways that SQL R Services can be invoked one from the R IDE via the ScaleR package; and as a stored procedure directly from SQL Server Management Studio. We learn how either scenario works and what is the intended use-case. We also learn R programming best practices to follow when working against data stored in SQL Server databases.

About the Course

Fetching data from a relational database such as SQL Server is not new to R. Through ODBC connection, R users can connect to a database and load data into an R session. However, ODBC connections are notoriously slow, especially when the data has to travel over the network (network IO). Moreover transferring data like this can often expose it to security vulnerabilities.

With SQL R Services, R users now have the ability to do their analytics in-database. That is to say that we take the analytics (R code) to the data instead of the other way around. Moreover, with Microsoft R Server's RevoScaleR package, the data scientist can develop and run all of their code from the comfort of their R IDE. Once the code is ready for deployment, it can then be turned into a stored procedure which other applications call at will.

In this course we delve into the details of the SQL Server R Services architecture, run example codes and learn R programming best practices to follow. We see how the RevoScaleR package and its data-processing and analytics functions can give us a best-of-both-worlds advantage, but also how to send any R code to run in-database. We learn how to use SQL Server to store R artifacts (such as model objects) and retrieve them later by R stored procedures (scoring new data with an R model) or SSRS (for rendering an R plot in a report).

Prerequisites

There are a few things you will need in order to properly follow the course materials:

  • Familiarity with R is helpful, but not required, as we run through the R code without too much explanation but R code can still look pretty straightforward and easy to follow to the novice.
  • Familiarity with the RevoScaleR package is a plus, and in particular its distributed data processing and analytics functions.
  • Basic knowledge of SQL Server administration and familiarity with the SQL language is required.

Agenda

We begin with a discussion around architecture:
Overview of the two SQL Server R Services architectures
Overview of Microsoft R Server's RevoScaleR package
Installing the Microsoft R Client
We then run examples of in-database analytics in SQL Server from the R IDE:
Pointing to data in SQL Server and setting the compute context to remote
Dealing with meta-data (especially for factor columns)
Summarizing and visualizing data using RevoScaleR
Leveraging SQL for efficiency
A modeling example using RevoScaleR
Running any R code in-database with the help of RevoScaleR
Saving R artifacts such as plots or model objects in-database
We then turn our attention to invoking an R script using a SQL Stored Procedure:
Use-cases for running R via stored procedures
Retrieving an R model object from a SQL table for use by the stored procedure
Retrieving an R plot for use by SQL Server Reporting Services (SSRS)

Author

Seth Mottaghinejad

Availability

Instructor Led / Instructor Led Classroom

Duration

1 day

Course Topics

R Language, Microsoft R Server, SQL Data Warehouse

Intended Audience

Database Admins, Architects
R users familiar with Microsoft R Server (MRS) and its RevoScaleR package who want to learn how to do in-database analytics in SQL Server, learn about architectural considerations and best practices. SQL Professionals who are familiar with R and MRS and want to learn about what to consider when using it in a SQL Server compute context.

Course Level

Intermediate

Azure Data Lake Analytics and Azure Data Store - Deep Dive

This two-day course focuses on using the Azure Data Lake Analytics to develop and manage a Big Data Analytics Solution. You'll learn how to ingest data into Azure Data Lake Store, process data in the store using U-SQL, move data to SQLDW and consume data lake datasets in Power BI. In addition, users will also gain an understanding of using various Data Lake SDKs available.

About the Course

This two-day course focuses on using the Azure Data Lake Analytics to develop and manage a Big Data Analytics Solution. You'll learn how to ingest data into Azure Data Lake Store, process data in the store using U-SQL, move data to SQLDW and consume data lake datasets in Power BI. In addition, users will also gain an understanding of using various Data Lake SDKs available.

Prerequisites

There are a few things you will need in order to properly follow the course materials:

  • OPERATING SYSTEM
  • Windows 7/8/10 x64 (Windows 10 Recommended) with 8GB of RAM. (16GB Recommended)
  • o X64 is REQUIRED
  • SOFTWARE TO INSTALL
  • Install Visual Studio 2015. These editions are supported:
  • o Ultimate
  • o Premium
  • o Professional
  • o Community
  • ADL Tools for VS
  • Azure PowerShell
  • Azure SDK
  • SUBSCRIPTION TO MICROSOFT AZURE
  • This may be provided through your company or as part of your invitation – you *must* have this enabled prior to class.

Agenda

Introduction to Azure Data Lake
Data Factory Introduction & Data Ingestion
Introduction to U-SQL Programming
Data Processing with U-SQL
Data Movement to SQLDW & Creating Pipelines
Consuming ADLS datasets in Power BI
Operating Data Lake SDKs & CLIs
Data Virtualization
Publishing Datasets in Azure Data Catalog

Author

Mithun Prasad

Availability

Instructor Led / Instructor Led Classroom

Duration

2 days

Course Topics

Data Lake Analytics, Data Factory, Data Catalog, Power BI

Intended Audience

Data Scientists, General, Developers, Architects
Technical professionals (Data Scientists, Database professionals, Analysts, BI Professionals, Data Engineers) who are interested in Big Data Solutions.

Course Level

Intermediate

Azure Machine Learning - Deep Dive

This two-day course is an introduction to machine learning and algorithms. You’ll develop a thorough understanding of the principles of machine learning and derive practical solutions using Azure Machine Learning studio. The course will also introduce you to the Team Data Science Process and include several practical examples.

About the Course

This two-day course is an introduction to machine learning and algorithms. You’ll develop a thorough understanding of the principles of machine learning and derive practical solutions using Azure Machine Learning Studio. The course will also introduce the Team Data Science Process and include several practical examples.

Prerequisites

There are a few things you will need in order to properly follow the course materials:

  • Azure ML Studio Account at https://studio.azureml.net
  • GitHub Desktop (optional, but recommended)

Agenda

1. ML 101
2. Types of ML
3. Preprocessing
4. Feature Selection
5. ML Algorithms
6. Building ML Experiments using Azure ML Studio
7. Customizing ML Experiments within the Studio
8. Deploying ML as a service

Author

Mithun Prasad

Availability

Instructor Led / Instructor Led Classroom

Duration

2 days

Course Topics

Machine Learning

Intended Audience

Data Scientists, Architects, Developers
Technical professionals (Data Scientists, Aspiring Data Scientists, Database professionals, Analysts, BI Professionals, Data Engineers) who are interested in Machine Learning.

Course Level

Intermediate

Building a modern data warehouse with Azure SQL DW

Welcome to the Cortana Intelligence Suite workshop – Azure SQL Data Warehouse Focus delivered by your Microsoft Data Science team. In this workshop, you’ll cover a series of modules that guide you from understanding an analytics workload, the Cortana Intelligence Suite Process, Massive Parallel Processing and loading a data warehouse using various tools.

About the Course

In this workshop you’ll cover a series of modules that guide you from understanding an analytics workload, the Cortana Intelligence Suite Process, Massive Parallel Processing and loading a data warehouse using various tools. You’ll also learn how to work through a real-world scenarios using the Cortana Intelligence Suite tools, including the Microsoft Azure Portal, PowerShell, and Visual Studio, among others.

This course is designed to take approximately two days, depending on what is covered and how many of the labs are done in-class. All materials are provided regardless of the length of the course.

Prerequisites

There are a few things you will need in order to properly follow the course materials:

  • There are a few things you need prior to coming to class:
  • A subscription to Microsoft Azure (this may be provided through your company or as part of your invitation – you must have this enabled prior to class – you will be using Azure throughout the course, for all labs, work and exercises)
  • You can use your MSDN subscription – https://azure.microsoft.com/en-us/pricing/member-offers/msdn-benefits/
  • Your employer may provide Azure resources to you, but make sure you check to see if you can deploy assets and that they know you’ll be using their subscription in the class.
  • Optionally, you may receive instructions in your class invitation.
  • Your workstation should have the following Software Installed:
  • Visual Studio installed – the Community Edition (free) is acceptable – Version 2015 preferable (https://www.visualstudio.com/en-us/products/visual-studio-community-vs.aspx)
  • SQL Server Data Tools for Visual Studio 2015
  • Azure SDK and Command-line Tools installed (https://azure.microsoft.com/en-us/downloads/ )
  • Azure Storage Explorer (http://go.microsoft.com/fwlink/?linkid=698844&clcid=0x409)
  • Power BI Desktop Installed (https://powerbi.microsoft.com/en-us/desktop/ )A background in data technologies, such as working with Relational and Non-Relational data processing systems
  • Install the Microsoft R Client: http://aka.ms/rclient/download with the R tools for Visual Studio
  • SQL Server 2016 Management Studio
  • SQL Server 2015 Visual Studio
  • Azure PowerShell SDK
  • Azure PowerShell ISE

Agenda

CIS overview and how Azure SQL Data Warehouse fits into CIS
Introduction to Azure SQL Data Warehouse
Working with Tables, Indexes and Statistics
Loading data into Azure SQL Data Warehouse
Managing Security and Administration.

Author

Chris Testa-O'Neill

Availability

Instructor Led / Instructor Led Classroom

Duration

2 days

Course Topics

SQL Data Warehouse

Intended Audience

Database Admins, Data Scientists, Developers, Architects
Technical professionals (Data Scientists, Database professionals, Analysts, BI Professionals, Developers) who are familiar with building solutions but not familiar with the entire CIS Platform of products.

Course Level

Intermediate

Cortana Intelligence Suite - Foundations

In this workshop you’ll cover a series of modules that guide you from understanding an analytics workload, the Cortana Intelligence Suite Process, the foundations of data transfer and storage, data source documentation, storage and processing using various tools.

About the Course

In this workshop you’ll cover a series of modules that guide you from understanding an analytics workload, the Cortana Intelligence Suite Process, the foundations of data transfer and storage, data source documentation, storage and processing using various tools. You’ll also learn how to work through a real-world scenario using the Cortana Intelligence Suite tools, including the Microsoft Azure Portal, PowerShell, and Visual Studio, among others.

This course is designed to take approximately one to two days, depending on what is covered and how many of the labs are done in-class. The longer course is marked (Extended Class). All materials are provided regardless of the length of the course.

Prerequisites

There are a few things you will need in order to properly follow the course materials:

  • There are a few things you need prior to coming to class:
  • A subscription to Microsoft Azure (this may be provided through your company or as part of your invitation – you must have this enabled prior to class – you will be using Azure throughout the course, for all labs, work and exercises)
  • You can use your MSDN subscription – https://azure.microsoft.com/en-us/pricing/memberoffers/msdn-benefits/
  • Your employer may provide Azure resources to you, but make sure you check to see if you can deploy assets and that they know you’ll be using their subscription in the class.
  • Optionally, you may receive instructions in your class invitation.
  • We’ll be using the Data Science Virtual Machine in Azure for the course. It has all of the tools you will need to work with the materials. Make sure you’re able to use the Remote Desktop Protocol (RDP) from your system to be able to work through the labs.
  • If you would also like to work with some of the tools locally (you still need an Azure subscription for this class), you can optionally obtain:
  • A laptop that you can install software on
  • Visual Studio installed – the Community Edition (free) is acceptable – Version 2015 preferable (https://www.visualstudio.com/en-us/products/visual-studio-community-vs.aspx)
  • Azure SDK and Command-line Tools installed (https://azure.microsoft.com/en-us/downloads/ )
  • Azure Storage Explorer (http://go.microsoft.com/fwlink/?linkid=698844&clcid=0x409)
  • Power BI Desktop Installed (https://powerbi.microsoft.com/en-us/desktop/ )A background in data technologies, such as working with Relational and Non-Relational data processing systems
  • Install the Microsoft R Client: http://aka.ms/rclient/download with the R tools for Visual Studio
  • It’s also a good idea to have a general level of predictive and classification Statistics, and a basic understanding of Machine Learning.

Agenda

What will you learn
Process and Platform, Environment Configuration
Data Discovery and Ingestion
Data Preparation
Modeling for Machine Learning and Data Mining (Extended Class)
Business Validation and Model Evaluation (Extended Class)
Deploying and Accessing the Solution
Workshop recap (Extended Class)

Author

Buck Woody

Availability

Instructor Led / Instructor Led Classroom

Duration

2 days

Course Topics

Cortana Intelligence

Intended Audience

Business Analysts, Data Scientists, Database Admins, Administrators, Designers
Technical professionals (Data Scientists, Database professionals, Analysts, BI Professionals) who are familiar with building solutions but not familiar with the entire CIS Platform of products.

Course Level

Intermediate

Cortana Intelligence Suite Workshop – Foundations and Azure SQL Data Warehouse

In this workshop you’ll cover a series of modules that guide you from understanding an analytics workload using the Cortana Intelligence Suite Process, the foundations of data transfer and storage, data source documentation, storage and processing using various tools. Additionally, this workshop also covers a series of modules that guide you from understanding Massive Parallel Processing and loading a data warehouse using various tools.

About the Course

In this workshop you’ll cover a series of modules that guide you from understanding an analytics workload, the Cortana Intelligence Suite Process, the foundations of data transfer and storage, data source documentation, storage and processing using various tools. Additionally, this workshop also covers a series of modules that guide you from understanding Massive Parallel Processing and loading a data warehouse using various tools. You’ll also learn how to work through a real-world scenario using the Cortana Intelligence Suite tools, including the Microsoft Azure Portal, PowerShell, and Visual Studio, among others.

Prerequisites

There are a few things you will need in order to properly follow the course materials:

  • A subscription to Microsoft Azure (this may be provided through your company or as part of your invitation – you must have this enabled prior to class – you will be using Azure throughout the course, for all labs, work and exercises)
  • You can use your MSDN subscription – https://azure.microsoft.com/en-us/pricing/member-offers/msdn-benefits/
  • Your employer may provide Azure resources to you, but make sure you check to see if you can deploy assets and that they know you’ll be using their subscription in the class.
  • Optionally, you may receive instructions in your class invitation.
  • Your workstation should have the following Software Installed:
  • Visual Studio installed – the Community Edition (free) is acceptable – Version 2015 preferable (https://www.visualstudio.com/en-us/products/visual-studio-community-vs.aspx)
  • SQL Server Data Tools for Visual Studio 2015
  • Azure SDK and Command-line Tools installed (https://azure.microsoft.com/en-us/downloads/ )
  • Azure Storage Explorer (http://go.microsoft.com/fwlink/?linkid=698844&clcid=0x409)
  • Power BI Desktop Installed (https://powerbi.microsoft.com/en-us/desktop/ )A background in data technologies, such as working with Relational and Non-Relational data processing systems
  • Install the Microsoft R Client: http://aka.ms/rclient/download with the R tools for Visual Studio
  • SQL Server 2016 Management Studio
  • SQL Server 2015 Visual Studio
  • Azure PowerShell SDK
  • Azure PowerShell ISE
  • It’s also a good idea to have a general level of predictive and classification Statistics, and a basic understanding of Machine Learning

Agenda

What will you learn
Process and Platform, Environment Configuration
Data Discovery and Ingestion
Data Preparation
Modeling for Machine Learning and Data Mining (Extended Class)
Business Validation and Model Evaluation (Extended Class)
Deploying and Accessing the Solution
CIS overview and how Azure SQL Data Warehouse fits into CIS
Introduction to Azure SQL Data Warehouse
Working with Tables, Indexes, and Statistics
Loading data into Azure SQL Data Warehouse
Managing Security and Administration.
Workshop recap (Extended Class)
Skills taught
Understand the CIS Process (General level), Understand CIS Components (General Level), Set up and configure the development environment
Understand how to source and vet proper data, Understand feature selection, Understand Azure Storage Options, Use various methods to ingest data into Azure Storage, Examine data stored in Azure Storage, Use various tools to explore data
Understand ADF and its constructs, Implement an ADF Pipeline referencing Data Sources and with various Activities including on-demand HDInsight Clusters, Understand the HIVE language and how it is used
Understand SQL Data Warehouse and its constructs, Implement loading activities into Azure SQL Data Warehouse.
Manage the security and administration of an Azure SQL Data Warehouse
Understand how to use Azure ML and how experiments are created, Understand how MRS can be used to perform Machine Learning experiments, Use ADF to schedule Azure ML Activities
Understand how to evaluate the efficacy and performance of an Azure ML experiment, Understand how to evaluate the efficacy and performance of an MSR ML experiment, Access and show data from Azure Storage, Access, and Query Azure SQL DB
Understand how to publish an Azure ML API, Understand the access methods of Azure Storage and Intelligent Processing, Understand the options to send a HIVE query to an HDI system, Use Power BI to query the results of a solution and create reports in Power BI Desktop, Power BI Service, and Power BI in Microsoft Excel
Understand when to use each component within CIS
Understand how to create an Azure SQL Data Warehouse using various tools.

Author

Chris Testa O'Neil

Availability

Instructor Led / Instructor Led Classroom

Duration

3 days

Course Topics

Cortana Intelligence, SQL Data Warehouse

Intended Audience

Data Scientists, Business Analysts, Database Admins, Developers
Technical professionals (Data Scientists, Database professionals, Analysts, BI Professionals, Developers) who are familiar with building solutions but not familiar with the entire CIS Platform of products.

Course Level

Intermediate

Cortana Intelligence Suite Workshop – Foundations And Microsoft R for the Architect

In this workshop, you’ll cover a series of modules that guide you from understanding an analytics workload, the Cortana Intelligence Suite Process, the foundations of data transfer and storage, data source documentation, storage and processing using various tools. This combined course will follow on with a session on R for the systems architect.

About the Course

Welcome to the Cortana Intelligence Suite workshop delivered by your Microsoft Data Science team. In this workshop, you’ll cover a series of modules that guide you from understanding an analytics workload, the Cortana Intelligence Suite Process, the foundations of data transfer and storage, data source documentation, storage and processing using various tools. You’ll also learn how to work through a real-world scenario using the Cortana Intelligence Suite tools, including the Microsoft Azure Portal, PowerShell, and Visual Studio, among others.

You’ll also cover a series of modules that guide you from a introduction of the R programming environment, to the Microsoft R platforms including: Microsoft Open R, the Microsoft R Client, Microsoft R Server, SQL Server with R Services, R in Azure ML, and HDInsight with R. The final lab is an SQL Server R Services solution and extrapolates to any Microsoft R platform.

Prerequisites

There are a few things you will need in order to properly follow the course materials:

  • There are a few things you need prior to coming to class:
  • A subscription to Microsoft Azure (this may be provided through your company or as part of your invitation – you must have this enabled prior to class – you will be using Azure throughout the course, for all labs, work, and exercises)
  • You can use your MSDN subscription – https://azure.microsoft.com/en-us/pricing/member-offers/msdn-benefits/
  • Your employer may provide Azure resources to you, but make sure you check to see if you can deploy assets and that they know you’ll be using their subscription in the class.
  • Optionally, you may receive instructions in your class invitation.
  • We’ll be using the Data Science Virtual Machine in Azure for the course. It has all of the tools you will need to work with the materials. Make sure you’re able to use the Remote Desktop Protocol (RDP) from your system to be able to work through the labs.
  • If you would also like to work with some of the tools locally (you still need an Azure subscription for this class), you can optionally obtain:
  • A laptop that you can install software on
  • Visual Studio installed – the Community Edition (free) is acceptable – Version 2015 preferable (https://www.visualstudio.com/en-us/products/visual-studio-community-vs.aspx)
  • Azure SDK and Command-line Tools installed (https://azure.microsoft.com/en-us/downloads/ )
  • Azure Storage Explorer (http://go.microsoft.com/fwlink/?linkid=698844&clcid=0x409)
  • Power BI Desktop Installed (https://powerbi.microsoft.com/en-us/desktop/ )A background in data technologies, such as working with Relational and Non-Relational data processing systems
  • Install the Microsoft R Client: http://aka.ms/rclient/download with the R tools for Visual Studio
  • It’s also a good idea to have a general level of predictive and classification Statistics, and a basic understanding of Machine Learning

Agenda

What will you learn
The Data Science Process, CIS Platform components, Tools installation and overview
Data sourcing, Feature selection techniques, Data cataloging, Data Ingestion, Data Exploration
Data selection, including Features, Dimension reduction, Data processing, Data transformation and augmentation
Algorithm selection and application, Parameter selection and adjustment
Business validation of report and results, Model testing and cross-validation
Deploying the solution using Data Destinations, Deploying the solution using API's, Deploying the Solution using Queries and Reports-
Mapping requirements to CIS solution elements, what to use when in CIS
The R Interactive Environment, Data Structures, Functions, Libraries (Packages) and Code Flow
The Microsoft R ecosystem
Working with Client Options
Planning, deploying, managing, and monitoring a Microsoft R platform
Walking through a complete 6-step solution - SQL Server R Services focused
Skills taught
Understand the CIS Process (General level), Understand CIS Components (General Level), Set up and configure the development environment
Understand how to source and vet proper data, Understand feature selection, Understand Azure Storage Options, Use various methods to ingest data into Azure Storage, Examine data stored in Azure Storage, Use various tools to explore data
Understand ADF and its constructs, Implement an ADF Pipeline referencing Data Sources and with various Activities including on-demand HDInsight Clusters, Understand the HIVE language and how it is used
Understand how to use Azure ML and how experiments are created, Understand how MRS can be used to perform Machine Learning experiments, Use ADF to schedule Azure ML Activities
Understand how to evaluate the efficacy and performance of an Azure ML experiment, Understand how to evaluate the efficacy and performance of an MSR ML experiment, Access and show data from Azure Storage, Access, and Query Azure SQL DB
Understand how to publish an Azure ML API, Understand the access methods of Azure Storage and Intelligent Processing, Understand the options to send a HIVE query to an HDI system, Use Power BI to query the results of a solution and create reports in Power BI Desktop, Power BI Service, and Power BI in Microsoft Excel
Understand when to use each component within CIS
Basic R coding
Choose, install, configure and use the proper R environment for a given solution
Connect to a Microsoft R platform from various client tools, run code locally and operationalize on server
Understand how to plan, deploy, manage, tune and monitor a Microsoft R solution
Deploy code to a Microsoft R Server, including SQL Server

Author

Buck Woody

Availability

Instructor Led / Instructor Led Classroom

Duration

3 days

Course Topics

Cortana Intelligence, R Language

Intended Audience

Data Scientists, Database Admins, Business Analysts, Architects, DevOps
Technical professionals (Data Scientists, Database professionals, Analysts, BI Professionals) who are familiar with building solutions but not familiar with the entire CIS Platform of products - with some degree of familiarity with relational databases systems (RDBMS) who need to learn more about using R in the Microsoft R ecostructure, and need to know how the components work, how to plan, deploy and manage and tune a Microsoft R platform.

Course Level

Intermediate

Data Analysis with R, Microsoft R, and SQL Server R Services

This three-day course is a holistic deep-dive in data science: we learn about the programming and technology, statistics and modeling, and big data and deployment all in the process of doing an actual analysis involving the New York City taxi dataset.

About the Course

In this course we see how open source R, Microsoft R and SQL Server R Services can work together to build data science solutions. We run hands-on exercises and learn best practices for R programming. We see how the RevoScaleR package and its data-processing and analytics functions can not only allow our analytics to scale to large datasets, but also deploy it inside of a production environment like SQL Server, all from the comfort of our R IDE. We also learn about the specifics of working with R inside of SQL Server, such as how to store R artifacts (such as model objects) and retrieve them later by R stored procedures (scoring new data with an R model) or SQL Server Reporting Services (for rendering an R plot in a report).

This course plays on the interaction between a data analysis problem and the tools of the trade to keep participants engaged and busy learning. Like any data science project, we start with a problem (and data). In the process of cleaning and exploring the data we learn about strengths and shortcomings of our tools and how to leverage the strengths and get around the shortcomings. We then slowly make our way from exploratory data analysis (EDA) to model building, discussing machine learning examples and pitfalls while exploring ways we can improve and iterate. We end by talking about the challenge of deploying a model in a SQL Server production environment and calling it from an application. The course is intended to generate a lot of discussion about data science as a process and teach you how to think like a data scientist.

After completing it, students will have a deeper understanding of the data science process, learn how R, Microsoft R and SQL Server R Services can be used to develop a data science pipeline, and what best practices to follow.

Prerequisites

There are a few things you will need in order to properly follow the course materials:

  • Participants are expected to be familiar with R basics, especially the basic data types and structures in R, how to subset each, and similarities and differences between them. Familiarity with data analysis using R and a basic knowledge of stats and the data science process is helpful, as well as some experience with SQL Server.

Agenda

We have an overview of RevoScaleR and show you how to access it by downloading and installing the Microsoft R Client on a Windows workstation. We then get the NYC Taxi data used during the course. Finally, we install the required R packages we will be using throughout the course.
We talk about two different ways that RevoScaleR can handle the data and the trade-offs involved.
We examine the data and ask how we can clean it and then make it richer and more useful to the analysis problem. In the process, we learn how to use RevoScaleR to perform data transformations and how third-party packages can be leveraged.
We examine the data visually and through various summaries to see what does and does not mesh with our understanding of the problem. We look at sampling as a way to examine outliers.
We examine ways of visualizing our results and getting a feel for the data. In the process, we learn how RevoScaleR interacts with various visualization tools.
We look at k-means clustering as our first RevoScaleR analytics function and look at how we can improve its performance when the dataset is large.
We build a few predictive models and show how we can examine the predictions and compare the models. We see how our choice of the model can have performance implications.
We talk about RevoScaleR's write-once-deploy-anywhere philosophy and talk about what we mean by a compute context. We then take this into practice by deploying our code into SQL Server and talk about architectural differences.
We go into SSMS and see how we can call R via stored procedures, retrieve a model object and score new data with it. We also examine how we can retrieve plots and serve them in SSRS.

Author

Seth Mottaghinejad

Availability

Instructor Led / Instructor Led Classroom

Duration

3 days

Course Topics

R Language, Microsoft R Server

Intended Audience

Business Analysts, Data Scientists, Architects
This course has two main audiences: it is primarily geared at budding data scientists who are familiar with R and want to take their skills to the next level by performing a full-blown analysis, running into various challenges and learning how to overcome them using R and Microsoft R. But it can also be a great course for solution architects or SQL professionals who want to understand what R is good at, how Microsoft R builds on top of open-source R and how to implement a solution in a SQL Server production environment.

Course Level

Intermediate

Deep Learning for Beginners: Convolutional Neural Networks

What better way for machines to learn than to emulate the human brain? Learn about "Convolutional Neural Networks" that are inspired by biological network structures like found in the human brain visual cortex. This two-day course is aimed at Convolutional Neural Networks (CNNs). The course provides a thorough and intuitive understanding of CNNs and it's applications in Vision and Natural Language Processing. Get introduced to the various architecture variations and learn how to build your own CNN.

About the Course

Deep learning is one of the fastest growing areas of machine learning and a hot topic in both academia and industry. The Deep Learning workshop is heavily focussed on Convolutional Neural Networks (CNNs) applied to the field of Vision and NLP including Word Embeddings. The workshop is intended to provide a thorough and intuitive understanding of CNNs covering the theoretical aspects in detail. Additionally, well known benchmark architectures and state of the art work is also discussed in the workshop.

At the end of the workshop, attendees would have understood CNNs, Word Embeddings, Benchmark Architectures and applications of them in various fields. Students would also get introduced to CNTK.

Prerequisites

There are a few things you will need in order to properly follow the course materials:

  • Bring your own laptop
  • Access to Azure Notebooks
  • Basic knowledge of Python

Agenda

Introduction to Deep Learning and its applications
Basic Concepts: Training a Neural Network from scratch
Convolutions & CNN models
Applications to Vision and NLP
Word Embeddings
Benchmark Architectures.
Transfer Learning
Introduction to CNTK

Author

Mithun Prasad

Availability

Instructor Led / Instructor Led Classroom

Duration

2 days

Course Topics

Machine Learning

Intended Audience

Data Scientists, Developers, Business Analysts

Course Level

Intermediate

Developing and Deploying Intelligent Chat Bots

[ IMPORTANT NOTE: The scheduling of new deliveries of this course has been temporarily suspended. If you are interested in the content for learning and re-delivery, please find the location of the training materials given in the link here https://github.com/Azure/bot-education. ] This 2-day course, designed for developers and data scientists, will ramp up the attendee very quickly on Microsoft's powerful machine learning algorithm APIs as a part of Cognitive Services and chat bot development tools as part of the Bot Framework.

About the Course

After completing this 2-day course, an attendee will have a comprehensive overview of Microsoft's powerful machine learning APIs and comprehensive knowledge around chat bot development capabilities.

Specifically, an attendee will gain

- a comprehensive overview of the types of ML algorithms available under the Cognitive Services APIs

- a practical understanding into the development tools essential to building a chat bot

- in depth knowledge into the programmatic structure of a chat bot

- in depth knowledge into the way the Microsoft Bot Framework handles messages, state and registration

- insight into creating an enjoyable chat bot experience including best practices

Prerequisites

There are a few things you will need in order to properly follow the course materials:

  • Prerequisites
  • There are a few things you will need in order to take full advantage of the course:
  • Please bring a laptop with internet connectivity.
  • Node.js with npm installed locally - get the latest at:
  • https://nodejs.org/en/download/
  • Visual Studio Code [recommended] or equivalent code editing and debugging environment with IntelliSense.
  • https://code.visualstudio.com/download
  • Bot Framework Emulator (Windows and Unix-compatible) installed locally - information and links at
  • https://docs.botframework.com/en-us/tools/bot-framework-emulator
  • GitHub Account - a code repository and collaboration tool we'll use
  • https://github.com/join
  • Git Bash - included in git download
  • https://git-scm.com/downloads
  • Azure account [recommended] - use the one you have, sign up for a free trial at https://azure.microsoft.com/en-us/free/, or, if you have an MSDN account for development link up your existing Azure benefit
  • We will assume you have already have the following background:
  • Basic knowledge around using and navigating in a unix-style command line or terminal (for using Git Bash) (good basic guide at http://linuxcommand.org/lc3_learning_the_shell.php)
  • Familiarity with Git and GitHub as a tools for software development, versioning and collaboration. (great book on Git at https://git-scm.com/book/en/v2)
  • Have learned about debugging bots with VSCode in https://docs.botframework.com/en-us/node/builder/guides/debug-locally-with-vscode/ docs.
  • If you are new to Node, here's a good video tutorial series at https://www.youtube.com/playlist?list=PL6gx4Cwl9DGBMdkKFn3HasZnnAqVjzHn_

Agenda

Each day is broken up into 1-4 hour Modules, where you will learn and perform labs on your own. Some material that is out of scope for hands-on labs will instead be demonstrated by instructor-led labs. The modules, broken up into a general agenda are as follows. The specific modules may bleed across sessions depending on the engagement of the audience.
Day 1
Morning - Cognitive Services Overview with Demos
Afternoon - Cognitive Services Topic Deep Dive
Day 2
Early Morning - Bot Framework Overview and User Experience Best Practices
Late Morning - Developer's Introduction and Deploying an Intelligent Bot
Afternoon - Deep Dive into the Microsoft Bot Framework

Author

Micheleen Harris

Availability

Instructor Led / Instructor Led Classroom

Duration

2 days

Course Topics

Bot Framework, Cognitive Services

Intended Audience

Developers, Data Scientists
This course is for intermediate and advanced developers with interest or experience in machine learning. Experience with the Node.js language is recommended for this course.

Course Level

Intermediate

Microsoft R For Architects

In this overview we'll cover a review of the R programming environment, and the various Microsoft R platforms, with a focus on SQL Server R Services. You'll learn how and when to implement R in SQL Server for Advanced Analytic solutions.

About the Course

In this overview we'll cover a review of the R programming environment, and the various Microsoft R platforms, with a focus on SQL Server R Services. You'll learn how and when to implement R in SQL Server for Advanced Analytic solutions. We'll cover the Team Data Science Process, the Open-Source R ecostructure, Microsoft R implementations on multiple platforms, and focus on operationalizing an R application. This course focuses on the architecture, planning, management and operation of an R platform with a practical example in SQL Server.

Prerequisites

There are a few things you will need in order to properly follow the course materials:

  • None

Agenda

What will you learn
In this overview, we'll cover a review of the R programming environment, and the various Microsoft R platforms, with a focus on SQL Server R Services. You'll learn how and when to implement R in SQL Server for Advanced Analytic solutions.
Sills taught
Basic R coding
Choose, install, configure and use the proper R environment for a given solution
Connect to a Microsoft R platform from various client tools, run code locally and operationalize on server
Understand how to plan, deploy, manage, tune and monitor a Microsoft R solution
Deploy code to a Microsoft R Server, including SQL Server

Author

Buck Woody

Availability

Instructor Led / Instructor Led Classroom

Duration

1 day

Course Topics

Microsoft R Server

Intended Audience

Architects, Database Admins, Developers
Technical Professionals familiar with the SQL Server RDBMS platform who want to learn more about implementing, monitoring and managing R services in their environment.

Course Level

Intermediate

Microsoft R for the SQL Server Professional

This workshop covers working with Microsoft R using SQL Server Data. It covers the basics of the R language, using R with SQL Server Databases, and using SQL Server R Services in a solution.

About the Course

In this workshop you’ll cover a series of modules that guide you from understanding an analytics workload, using the Microsoft R feature of the Cortana Intelligence Suite and Process. You’ll also cover a series of modules that guide you from an introduction to the R programming environment, the Cortana Intelligence Suite Process, the Cortana Intelligence Suite Platform, to the Microsoft R platforms including: Microsoft Open R, the Microsoft R Client, Microsoft R Server, SQL Server with R Services, R in Azure ML, and HDInsight with R. Final lab is an SQL Server R Services solution in Transact-SQL.

Prerequisites

There are a few things you will need in order to properly follow the course materials:

  • If you want to use a Virtual Machine in Azure, prior to coming to class you will need:
  • A subscription to Microsoft Azure (this may be provided through your company or as part of your invitation – you must have this enabled prior to class – you will be using Azure throughout the course, for all labs, work and exercises)
  • You can use your MSDN subscription – https://azure.microsoft.com/en-us/pricing/member-offers/msdn-benefits/
  • Your employer may provide Azure resources to you, but make sure you check to see if you can deploy assets and that they know you’ll be using their subscription in the class.
  • Optionally, you may receive instructions in your class invitation.
  • We’ll be using the Data Science Virtual Machine in Azure for the course. It has all of the tools you will need to work with the materials. Make sure you’re able to use the Remote Desktop Protocol (RDP) from your system to be able to work through the labs.
  • If you would also like to work the tools locally, you should install (prior to class):
  • Visual Studio – the Community Edition (free) is acceptable – Version 2015 preferable (https://www.visualstudio.com/en-us/products/visual-studio-community-vs.aspx)
  • SQL Server 2016 (or higher) Developer Edition with ALL features and options selected
  • Power BI Desktop Installed (https://powerbi.microsoft.com/en-us/desktop/ )
  • Install the Microsoft R Client: http://aka.ms/rclient/download with the R tools for Visual Studio
  • It’s also a good idea to have a general level of predictive and classification Statistics, and a basic understanding of Machine Learning

Agenda

What will you learn
Process and Platform, Environment Configuration
Data Discovery and Ingestion
Data Preparation
Modeling for Machine Learning and Data Mining
Key Concepts in R
The Microsoft R Platform
R Client Options
Operationalize Microsoft R Solutions
Creating a Microsoft R Solution
Skills taught
Understand the CIS Process (General level), Understand CIS Components (General Level), Set up and configure the development environment
Understand when to use each component within CIS
Basic R coding
Choose, install, configure and use the proper R environment for a given solution
Connect to a Microsoft R platform from various client tools, run code locally and operationalize on server
Deploy code to a Microsoft R Server, including SQL Server

Author

Buck Woody

Availability

Instructor Led / Instructor Led Classroom

Duration

1 day

Course Topics

Cortana Intelligence, R Language, Microsoft R Server

Intended Audience

Architects, Business Analysts, Database Admins, Developers
Technical professionals who are familiar with Transact-SQL who need to learn more about using R in the Microsoft R ecostructure, and need to know how the components work, how to develop and use a solution using R Code in and with SQL Server databases.

Course Level

Intermediate

Microsoft R Open for Business Analysts

Microsoft R Open (MRO) for Business Analysts is designed so that current SAS, SPSS or SQL users with basic to intermediate knowledge of R can take their R skills to the next level by tackling data analysis problems using R and its wide array of third-party packages.

About the Course

Once we know R at a basic level, the best way to sharpen our R skills is by working on a data analysis problem head-on. In this course, we takes a use-case-based approach by tackling the New York City taxi data using R. There are ample lab exercises to reinforce concepts and learn new ones.

We do not shy away from using third-party packages when doing simplifies our work: and in particular GIS packages, ggplot2 for plotting, and dplyr for data processing. However, only dplyr is relevant to the course and explored in-depth. Data visualization and GIS packages are out of scope and not covered in-depth, although a basic explanation is provided and all the code will be provided for users who want to delve more in-depth on their own time.

While we do not cover Microsoft R Server (MRS) during this course, a secondary goal of the course is to prepare users for MRS and its set of tools and capabilities for scalable big data-processing and analytics. So this course can also be viewed as a prerequisite for taking learning to use MRS.

After completing this course, participants will be able to use R to perform a thorough data analysis task that starts with ingesting a raw flat file and performing exploratory data analysis, with lots of summaries and visualizations to boot. The user will gain an appreciation for packages such as dplyr in helping us set up robust and easy-to-modify data pipelines, ggplot2 and its straightforward notation, and will learn to think better like an R programmer and write more efficient and straight-forward R code.

Prerequisites

There are a few things you will need in order to properly follow the course materials:

  • Know your R data types: vector, array, list, and data.frame. Know what sets each apart, what they have in common, and what advantages they each offer, and how to subset each object.
  • Know how to write basic R functions.
  • Be comfortable using an R IDE (such as Visual Studio with RTVS or RStudio).
  • Have a basic understanding of common data analysis tasks.

Agenda

Setting up the environment
Loading data into R
Inspecting the data: We run sanity checks on the data and get a feel for the data
Cleaning the data: We deal with column types, especially with the factor columns
Being more efficient: We learn how pre-processing can lead to more efficiency
Creating new features: Starting with the raw data, we ask how we can make it a more useful data to the analysis by adding relevant features
Data summary and visualization: We explore various ways we can summarize the data using both base R and dplyr. We use ggplot2 to visualize results

Author

Seth Mottaghinejad

Availability

Instructor Led / Instructor Led Classroom

Duration

2 days

Course Topics

R Language

Intended Audience

General, Business Analysts, Data Scientists
Participants should have experience with common data analysis tasks: cleaning data, combining or reshaping data, summarizing and visualizing data, etc. Ideally, the participants' background will be in a data analysis role (using a platform such as SAS, SPSS, SQL, or Python). Participants should have prior exposure to the R programming language at a basic level.

Course Level

Intermediate

Open Source R and Microsoft R Workflows in Data Science

This workshop will teach you how to do data science with Microsoft R Server. It will teach you the fundamentals of R programming for data ingestion, exploratory data analysis, model building, evaluation, and operationalization.

About the Course

This workshop will teach you the fundamentals of data science with R and Microsoft R Server. It will teach you the fundamentals of R programming for data ingestion, exploratory data analysis, visualization, model building, evaluation, and operationalization. You will learn how to write effective R code, that is robust to your data structures and your computing environment, and can be operationalized in production. Through hands-on programming labs, you'll learn how to use Microsoft R Server to do out-of-core data processing, and create scalable data science solutions for a variety of workloads.

At the end of the course, you will have learned how to use R effectively for data science, and use it through the Microsoft R Server distribution to deploy and operationalize scalable data science solutions.

Prerequisites

There are a few things you will need in order to properly follow the course materials:

  • We will assume you have some experience with R or a similar data processing framework. For those looking for a quick refresher, I recommend the edx course:
  • https://www.edx.org/course/introduction-r-data-science-microsoft-dat204x-3

Agenda

Day One
Fundamentals of R Programming
Functional Programming with Open Source R
Data Manipulation and Deriving Tidy Datasets
Visualization with R
Training and Evaluating Models with R
Day Two
Scalable Data Processing with Microsoft R Server
Performance Optimizations and Tuning
Predictive Modeling with Microsoft R Server
Operationalization with AzureML and mrsdeploy

Author

Ali Zaidi

Availability

Instructor Led / Instructor Led Classroom

Duration

2 days

Course Topics

R Language, Microsoft R Server

Intended Audience

General
Data analysts who want to know more about Microsoft R.

Course Level

Intermediate

Operationalizing Solutions with Azure Data Factory

In this workshop, you’ll cover how Azure Data Factory can be used to incorporate machine learning into your data workflows.

About the Course

Welcome to the Operationalizing Solutions with Azure Data Factory (ADF), delivered by your Microsoft Data Science team. In this workshop, you’ll cover how ADF can be used to incorporate machine learning into your data workflows. This allows your data to be used for predictive purposes and not just historical reporting.

This course is designed to take approximately 2-3 hours. All materials are provided for follow-on self-study.

After completing the course, a student should be able to deploy a predictive experiment from Azure ML as an API and schedule Azure Data Factory to call that API in order to make predictions on data.

Prerequisites

There are a few things you will need in order to properly follow the course materials:

  • Experience and an understanding of ADF (similar to what would be covered in the Cortana Intelligence Suite Workshop)
  • Experience using Azure ML (similar to what would be covered in the Cortana Intelligence Suite Workshop)
  • A subscription to Microsoft Azure (this may be provided through your company or as part of your invitation – you must have this enabled prior to class – you will be using Azure throughout the course, for all labs, work and exercises)
  • You can sign up for a free account here (but don’t use it until the class starts, and don’t sign up more than a week in advance of the class) – https://azure.microsoft.com/en-us/pricing/free-trial/
  • Or you can use your MSDN subscription – https://azure.microsoft.com/en-us/pricing/member-offers/msdn-benefits/
  • Your employer may provide Azure resources to you, but make sure you check to see if you can deploy assets and that they know you’ll be using their subscription in the class.
  • It’s also a good idea to have a general level of predictive and classification Statistics, and a basic understanding of Machine Learning. A brief overview of these technologies is covered for the concepts presented.

Agenda

What will you learn
This course is self-paced, so the videos and labs can be done at your own pace. In total, the materials should take 2-3 hours.
Skills taught
Sourcing Data from Azure Storage and other locations
Converting an Azure ML experiment into a production-level API
Call machine learning models as part of the ADF pipeline
Storing results to Azure Storage and other locations

Author

Ryan Swanstrom

Availability

Self Paced / Video

Duration

half day

Course Topics

Data Factory, Machine Learning

Intended Audience

Data Scientists, Developers, Architects, DevOps
This course is intended for people with experience using both Azure Data Factory and Azure Machine Learning

Course Level

Intermediate

Scalable Data Science with Microsoft R Server and Spark with HDInsight

In this course, you’ll gain hands-on experience with Microsoft R and HDInsight Spark for scalable data science and machine learning. You will learn about the HDInsight platform and how to leverage Microsoft R Server as an application on top of HDInsight Spark to perform data analysis and machine learning at scale. This is a three-day course and will teach you about Microsoft R and Spark from the ground up.

About the Course

In this course, you’ll gain hands-on experience with Microsoft R and HDInsight Spark for scalable data science and machine learning. You will learn about the fundamentals of functional programming, parallel external memory algorithms, Spark on HDInsight, and distributed systems. This course emphasizes robust programming principles, so that you can write programs that are portable, platform invariant, and scalable. Through labs and instructor led deep dives, you will learn how to use R Server on Spark with the HDInsight platform to perform data analysis and machine learning at scale.

By the end of the course, you will have developed applications that are scalable and portable, and know how to configure your Spark clusters to maximize your application's performance.

Prerequisites

There are a few things you will need in order to properly follow the course materials:

  • A subscription to Microsoft Azure (this may be provided through your company or as part of your invitation – you *must* have this enabled prior to class. You will be using Azure throughout the course, for all labs, work, and exercises. You can use your MSDN subscription (https://azure.microsoft.com/en-us/pricing/member-offers/msdn-benefits/), your employer may provide Azure resources to you, or you may receive instructions in your class invitation, and have at least $50 to spend for the course.
  • Understanding of R - ability to write functions, an ability to train models, etc.
  • Putty, Cygwin, or some bash emulator (some Linux experience to go with it would be useful)
  • It’s also a good idea to have a general level of predictive and classification modeling, and a basic understanding of Statistics and Machine Learning, i.e., cross-validation, ensemble models, model metrics, etc.

Agenda

What will you learn
Functional-Object Based Computing with R
Overview of the R Project and CRAN
Exploring the Microsoft R Data Stack
Functional Programming for Data Manipulation with the dplyr package
Understanding dplyr's symantics and the magrittr pipe
Data Visualization and Exploratory Data Analysis
Using the broom package for Modeling and Summarization
Breaking the Memory Barrier with RevoScaleR
Overview of the Microsoft R Data Ecosystem
Modeling and Scoring with High-Performance ScaleR Algorithms
Data Manipulation with the dplyrXdf Package
Summarizing Data with RevoScaleR
Performance Considerations with RevoScaleR
Parallel Computing and Distributed Computing with Microsoft R Server
Deploying R and ScaleR algorithms to Azure with the AzureML package
Overview of the Apache Spark Project
Ingesting Data into Azure Blob Storage
Creating Spark DataFrames and Spark Contexts
Manipulating HDFS data with the sparklyr package
Creating Distributed eXternal DataFrames in HDFS
Preparing Data for Modeling with Microsoft R Server
Training Statistical Models with Microsoft R Server and the Spark Compute Context
Scoring and Deploying Models
Performance Considerations on Hadoop
Skills taught
Understand what is Spark and why it's a more effective solution for iterative machine learning jobs than Hadoop MapReduce.
Understand functional programming and lazy evaluation.
Provision and deploy HDInsight Spark Clusters and install R Server as an application.
Understand the basics of administration and management of packages and applications on premium HDInsight Spark clusters.
Develop functions that are robust to different data structures and execution environments.
Use Spark and it's R APIs for exploratory data analysis.
Train and tune statistical machine learning models with Microsoft R Server's RxSpark compute context.
Deploy trained R models as an Azure ML web service.

Author

Ali Zaidi

Availability

Instructor Led / Instructor Led Classroom

Duration

3 days

Course Topics

Microsoft R Server

Intended Audience

General, Data Scientists, Architects
This course is meant for data scientists, data analysts, and experienced data architects who have programming experience with R and want to use it with Hadoop and Spark. Participants are expected to know the following about R: it's data structures, creating and using functions, and a little bit about functional programming. Participants are expected to know the basics of Azure Data Storage, and will need to have an Azure subscription with at least $50 to spend to complete this course. Some understanding of Hadoop and Spark is recommended, but not required.

Course Level

Intermediate

Spark with HDInsight - Enterprise Ready Machine Learning and Interactive Data Analysis at Scale

Spark has become the most popular and perhaps most important distributed data processing framework for Hadoop. In particular, it is particularly amenable to machine learning and interactive data workloads, and can provide an order of magnitude greater performance than traditional Hadoop data processing tools. In this course, we will provide a deep-dive into Spark as a framework, understand it's design, how to optimally utilize it's design, and how to develop effective machine learning applications with Spark on HDInsight.

About the Course

Spark has become the most popular and perhaps most important distributed data processing framework for Hadoop. In particular, it is particularly amenable to machine learning and interactive data workloads, and can provide an order of magnitude greater performance than traditional Hadoop data processing tools. In this course, we will provide a deep-dive into Spark as a framework, understand it's design, how to optimally utilize it's design, and how to develop effective machine learning applications with Spark on HDInsight.

The course covers the fundamentals of Spark, it's core APIs and design, relational data processing with Spark SQL, the fundamentals of Spark job execution, performance tuning, tracking and debugging. Users will get hands-on experience with processing streaming data with Spark streaming, training machine learning algorithms with Spark ML and R Server on Spark, as well as HDInsight configuration and platform specific considerations such as remote developing and access with Livy and IntelliJ, secure Spark, multi-user notebooks with Zeppelin, and virtual networking with other HDInsight clusters.

Prerequisites

There are a few things you will need in order to properly follow the course materials:

  • - Hadoop - Administration, Configuration, and Security

Agenda

Day One - Spark on HDInsight Overview
Spark Clusters on HDInsight
Developer Tools and Remote Debugging with IntelliJ IDEA
Submitting Spark Jobs Remotely Using Livy
Spark Fundamentals - Functional Programming, Scala and the Collections API
Cluster Architecture
RDDs - Parallel, Distributed Memory Data Structures
Spark SQL/DataFrames - Relational Data Processing with Spark
Sharing Metastore and Storage Accounts with Hadoop/Hive Clusters and Spark Clusters
DataFrames API - Collection of Rows with a Consistent Schema
Integrated APIs for Mixing Relational, Graph, and ML Jobs
Exploring Relational Data with Spark SQL
Catalyst Query Optimization
Optimizing Joins in Spark SQL
Broadcat Joins versus Merge Joins
Creating Custom UDFs for Spark SQL
Caching Spark DataFrames, Saving to Parquet
Day Two - Spark Job Execution, Performance Tuning, Tracking and Debugging
Jobs, Stages, and Tasks
Spark Contexts, Applications, the Driver Program and Spark Executors
Partitions and Shuffles
Understanding Data Locality
Monitoring Spark Jobs with the Spark WebUI
Managing Spark Thrift Servers and Changing YARN Resource Allocations
Managing Interactive Livy Sessions and their Resources
Monitoring Spark Jobs with Spark UI
Viewing Spark Job Graphs, and Understanding Spark Stages
Spark Streaming
Creating Spark Streaming Applications Using Spark DStreams APIs
DStreams, Stateful, and Stateless Streams
Comparison of DStreams and RDDs
Transformers for DStreams
Persisting Long Term Data in HBase, Hive or SQL
Creating Spark Structured Streams
Using DataFrames and DataSets API to Create Streaming DataFrames and DataSets
Window Transformations for Stateful and Stateless Operations
Day Three - Spark Machine Learning and Graph Analytics
MLLib and Spark ML - Understanding API Patterns
Featurizing DataFrames using Transformers
Developing Machine Learning Pipelines with Spark ML
Cross-Validation and Hyperparameter Tuning
Training ML Models on Text Data: Tokenization, TF/IDF, and Topic Modeling with LDA
Using Evaluators to Evaluate Machine Learning Models
Unsupervised Learning and Clustering
Managing Models with ModelDB
Understanding Graph Analytics and Graph Operators
Vertex and Edge Classes
Mapping Operations
Measuring Connectedness
Training Graph Algorithms with GraphX
Performance and Monitoring
Reducing Memory Allocation with Serialization
Checkpointing
Visualizing Networks with SparkR, d3 and Jupyter-

Author

Ali Zaidi

Availability

Instructor Led / Instructor Led Classroom

Duration

3 days

Course Topics

HDInsight (Hadoop & Spark), Machine Learning, Microsoft R Server

Intended Audience

Data Scientists, Administrators, Developers
Data Scientists interested in Spark

Course Level

Advanced

The Hadoop Hero: Administering, Configuring and Securing HDInsight Clusters

Hadoop has proved itself to be a scalable solution for the enterprise, providing a large ecosystem of advanced analytics and big data tools in a unified framework. However, managing this diverse ecosystem and ensuring that it's users are able to obtain maximum performance out of it's clusters is a difficult task. This course will be of primary interest to HDInsight cluster administrators, but also to developers, architects and even data scientists, eager to ensure their applications are able to gleam the maximum performance and security out of their clusters.

About the Course

Hadoop has proved itself to be a scalable solution for the enterprise, providing a large ecosystem of advanced analytics and big data tools in a unified framework. However, managing this diverse ecosystem and ensuring that it's users are able to obtain maximum performance out of it's clusters is a difficult task. This course will be of primary interest to cluster administrators, but also to developers, architects and even data scientists, eager to ensure their applications are able to gleam the maximum performance and security out of their clusters.

We will teach administrators the fundamentals of HDInsight's design and architecture and how to ensure their clusters are secure and meet the requirements of it's users. We will discuss configuration, administration, command line tools for debugging, and tips on achieving maximum performance in a variety of common Hadoop big data and advanced analytics workflows, particularly, Spark and Hive. By the end of the course, participants will have a solid understanding of the behind-the-scenes adminstration mechanisms in Hadoop using Hadoop configuration files, and will know how to secure their clusters, enable and manage unique application workloads, and set groups and permissions for users and applications.

Prerequisites

There are a few things you will need in order to properly follow the course materials:

Agenda

Day One
HDInsight and Hadoop Fundamentals
HDInsight Cluster Options
Programmatic Provisioning of HDInsight Hadoop Clusters
Moving Data in and Out of Azure Storage
Management and Configuration for Hadoop Services and HTTP Web Services
Day Two
Overview Apache Ambari
Manage Hadoop Applications with YARN CLI
Managing Ambari Users and Groups
Securing Hadoop with Apache Ranger
Manage Alert Groups with Ambari
Manage and Configure Storage
Resource Allocation and Configuration
Manage and Configure Queues, Capacity Scheduler, and Node Access
Day Three
Create and Modify YARN Node Labels Using Ambari
Define YARN Containers
Monitoring Hadoop Jobs and Applications
Optimizing Hadoop Jobs and Troubleshooting
Tuning Hive Jobs in Hadoop
Spark Job Execution, Performance Tuning, Tracking and Debugging (time permitting)

Author

Ali Zaidi

Availability

Instructor Led / Instructor Led Classroom

Duration

3 days

Course Topics

HDInsight (Hadoop & Spark)

Intended Audience

Database Admins, Administrators, Architects
This course will be of primary interest to HDInsight cluster administrators, but also to developers, architects and even data scientists, eager to ensure their applications are able to gleam the maximum performance and security out of their clusters.

Course Level

Advanced

Visualizing Data with Power BI

In this workshop, you’ll cover a series of advanced topics involving Power BI. You’ll also learn how to work through a real-world scenario using Power BI with either the supplied dataset or your very own data.

About the Course

Welcome to the Visualizing Data with Power BI workshop delivered by your Microsoft Data Science team. In this workshop, you’ll cover a series of advanced topics involving Power BI. You’ll also learn how to work through a real-world scenario using Power BI with either the supplied dataset or your very own data.

This course is designed to take approximately 3 hours.

After completing the course, a student should be able to create effective visuals with Power BI. The student should understand how to use DAX, embed visuals, and integrate with Stream Analytics.

Prerequisites

There are a few things you will need in order to properly follow the course materials:

  • An introductory level of knowledge with Power BI Desktop. This could be obtained by completing the Cortana Intelligence Suite Workshop or with experience creating reports with Power BI Desktop.
  • It’s also a good idea to have a general level of predictive and classification Statistics, and a basic understanding of Machine Learning
  • (Optional) A subscription to powerbi.com

Agenda

What will you learn
Students will gain and understanding and experience with Microsoft Power BI.
Concepts delivered
Advanced techniques and tips for creating effective reports with Power BI
How to present data with reports
DAX Queries
Data Sources/Refreshing
Power BI and the private cloud
Performance tips
Debugging
Integrating with Stream Analytics
Skills taught
Understand advanced techniques for creating reports with Power BI
Know how to visually structure your reports to tell the desired story

Author

Ryan Swanstrom

Availability

Instructor Led / Instructor Led Classroom

Duration

half day

Course Topics

Power BI

Intended Audience

Designers, Data Scientists, Business Analysts, Database Admins
Technical professionals (Data Scientists, Database professionals, Analysts, BI Professionals, Managers) who have used Power BI Desktop. This course assumes students are comfortable exploring features of Power BI without explicit step-by-step instruction.

Course Level

Intermediate