AWS Certified Machine Learning Specialty: MLS-C01 Certification Guide (2021)
AWS Certified Machine Learning Specialty: MLS-C01 Certification Guide (2021) delivers all the key information you need, including topic summaries, question banks, and study tips to ace your exam.
Amelia Ward
Contributor
4.5
48
4 months ago
Preview (16 of 338)
Sign in to access the full document!
AWS Certified
Machine Learning
Specialty: MLS-C01
Certification Guide
The definitive guide to passing the MLS-C01 exam on
the very first attempt
Somanath Nanda
Weslley Moura
BIRMINGHAM—MUMBAI
Machine Learning
Specialty: MLS-C01
Certification Guide
The definitive guide to passing the MLS-C01 exam on
the very first attempt
Somanath Nanda
Weslley Moura
BIRMINGHAM—MUMBAI
AWS Certified Machine Learning Specialty:
MLS-C01 Certification Guide
Copyright © 2021 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in
any form or by any means, without the prior written permission of the publisher, except in the case of brief
quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented.
However, the information contained in this book is sold without warranty, either express or implied. Neither
the authors, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused
or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products
mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the
accuracy of this information.
Group Product Manager: Kunal Parikh
Publishing Product Manager: Aditi Gour
Senior Editor: David Sugarman
Content Development Editor: Joseph Sunil
Technical Editor: Arjun Varma
Copy Editor: Safis Editing
Project Coordinator: Aparna Nair
Proofreader: Safis Editing
Indexer: Rekha Nair
Production Designer: Vijay Kamble
First published: March 2021
Production reference: 1180321
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham
B3 2PB, UK.
ISBN 978-1-80056-900-3
www.packt.com
MLS-C01 Certification Guide
Copyright © 2021 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in
any form or by any means, without the prior written permission of the publisher, except in the case of brief
quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented.
However, the information contained in this book is sold without warranty, either express or implied. Neither
the authors, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused
or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products
mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the
accuracy of this information.
Group Product Manager: Kunal Parikh
Publishing Product Manager: Aditi Gour
Senior Editor: David Sugarman
Content Development Editor: Joseph Sunil
Technical Editor: Arjun Varma
Copy Editor: Safis Editing
Project Coordinator: Aparna Nair
Proofreader: Safis Editing
Indexer: Rekha Nair
Production Designer: Vijay Kamble
First published: March 2021
Production reference: 1180321
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham
B3 2PB, UK.
ISBN 978-1-80056-900-3
www.packt.com
Contributors
About the authors
Somanath Nanda has 10 years of working experience in the IT industry, which includes
prod development, DevOps, and designing and architecting products from end to end.
He has also worked at AWS as a big data engineer for about 2 years.
Weslley Moura has 17 years of working experience in information technology (the last 9
years working on data teams and the last 5 years working as a lead data scientist).
He has worked in a variety of industries, such as financial, telecommunications,
healthcare, and logistics. In 2019, he was a nominee for data scientist of the year at the
European DatSci & AI Awards.
About the authors
Somanath Nanda has 10 years of working experience in the IT industry, which includes
prod development, DevOps, and designing and architecting products from end to end.
He has also worked at AWS as a big data engineer for about 2 years.
Weslley Moura has 17 years of working experience in information technology (the last 9
years working on data teams and the last 5 years working as a lead data scientist).
He has worked in a variety of industries, such as financial, telecommunications,
healthcare, and logistics. In 2019, he was a nominee for data scientist of the year at the
European DatSci & AI Awards.
About the reviewer
Arunabh Sahai is a results-oriented leader who has been delivering technology
solutions for more than 16 years across multiple industries around the globe. He is also
a forward-thinking technology enthusiast and a technology coach, helping learners to
polish their technology skills. Arunabh holds a master's degree in computer science and
has in-depth knowledge of cloud (AWS/Azure/GCP) technologies. He holds multiple
certifications attesting to his cloud technology knowledge and experience. He is also
passionate about intelligent automation using predictive analytics. You can connect with
him on his LinkedIn, and he will be happy to help you with your technology questions.
Arunabh Sahai is a results-oriented leader who has been delivering technology
solutions for more than 16 years across multiple industries around the globe. He is also
a forward-thinking technology enthusiast and a technology coach, helping learners to
polish their technology skills. Arunabh holds a master's degree in computer science and
has in-depth knowledge of cloud (AWS/Azure/GCP) technologies. He holds multiple
certifications attesting to his cloud technology knowledge and experience. He is also
passionate about intelligent automation using predictive analytics. You can connect with
him on his LinkedIn, and he will be happy to help you with your technology questions.
Loading page 6...
Preface
Section 1:
Introduction to Machine Learning
1
Machine Learning Fundamentals
Comparing AI, ML, and DL 4
Examining ML 5
Examining DL 6
Classifying supervised,
unsupervised, and
reinforcement learning 6
Introducing supervised learning 6
The CRISP-DM modeling
life cycle 9
Data splitting 12
Overfitting and underfitting 14
Applying cross-validation and
measuring overfitting 14
Bootstrapping methods 16
The variance versus bias trade-off 17
Shuffling your training set 18
Modeling expectations 18
Introducing ML frameworks 19
ML in the cloud 21
Summary 22
Questions 22
2
AWS Application Services for AI/ML
Technical requirements 30
Analyzing images and videos
with Amazon Rekognition 30
Exploring the benefits of Amazon
Rekognition 31
Getting hands-on with Amazon
Rekognition 32
Text to speech with Amazon
Polly 38
Table of Contents
Section 1:
Introduction to Machine Learning
1
Machine Learning Fundamentals
Comparing AI, ML, and DL 4
Examining ML 5
Examining DL 6
Classifying supervised,
unsupervised, and
reinforcement learning 6
Introducing supervised learning 6
The CRISP-DM modeling
life cycle 9
Data splitting 12
Overfitting and underfitting 14
Applying cross-validation and
measuring overfitting 14
Bootstrapping methods 16
The variance versus bias trade-off 17
Shuffling your training set 18
Modeling expectations 18
Introducing ML frameworks 19
ML in the cloud 21
Summary 22
Questions 22
2
AWS Application Services for AI/ML
Technical requirements 30
Analyzing images and videos
with Amazon Rekognition 30
Exploring the benefits of Amazon
Rekognition 31
Getting hands-on with Amazon
Rekognition 32
Text to speech with Amazon
Polly 38
Table of Contents
Loading page 7...
ii Table of Contents
Exploring the benefits of Amazon Polly 39
Getting hands-on with Amazon Polly 40
Speech to text with Amazon
Transcribe 45
Exploring the benefits of Amazon
Transcribe 46
Getting hands-on with Amazon
Transcribe 46
Implementing natural language
processing with Amazon
Comprehend 49
Exploring the benefits of Amazon
Comprehend 50
Getting hands-on with Amazon
Comprehend 51
Translating documents with
Amazon Translate 54
Exploring the benefits of
Amazon Translate 54
Getting hands-on with
Amazon Translate 55
Extracting text from documents
with Amazon Textract 58
Exploring the benefits of
Amazon Textract 59
Getting hands-on with
Amazon Textract 60
Creating chatbots on Amazon
Lex 65
Exploring the benefits of Amazon Lex 65
Getting hands-on with Amazon Lex 66
Summary 69
Questions 69
Answers 72
Section 2:
Data Engineering and Exploratory Data
Analysis
3
Data Preparation and Transformation
Identifying types of features 76
Dealing with categorical
features 78
Transforming nominal features 78
Applying binary encoding 80
Transforming ordinal features 81
Avoiding confusion in our train and
test datasets 81
Exploring the benefits of Amazon Polly 39
Getting hands-on with Amazon Polly 40
Speech to text with Amazon
Transcribe 45
Exploring the benefits of Amazon
Transcribe 46
Getting hands-on with Amazon
Transcribe 46
Implementing natural language
processing with Amazon
Comprehend 49
Exploring the benefits of Amazon
Comprehend 50
Getting hands-on with Amazon
Comprehend 51
Translating documents with
Amazon Translate 54
Exploring the benefits of
Amazon Translate 54
Getting hands-on with
Amazon Translate 55
Extracting text from documents
with Amazon Textract 58
Exploring the benefits of
Amazon Textract 59
Getting hands-on with
Amazon Textract 60
Creating chatbots on Amazon
Lex 65
Exploring the benefits of Amazon Lex 65
Getting hands-on with Amazon Lex 66
Summary 69
Questions 69
Answers 72
Section 2:
Data Engineering and Exploratory Data
Analysis
3
Data Preparation and Transformation
Identifying types of features 76
Dealing with categorical
features 78
Transforming nominal features 78
Applying binary encoding 80
Transforming ordinal features 81
Avoiding confusion in our train and
test datasets 81
Loading page 8...
Table of Contents iii
Dealing with unbalanced
datasets 101
Dealing with text data 103
Bag of words 104
TF-IDF 107
Word embedding 108
Summary 112
Questions 113
4
Understanding and Visualizing Data
Visualizing relationships in
your data 124
Visualizing comparisons in
your data 126
Visualizing distributions in
your data 130
Visualizing compositions in
your data 133
Building key performance
indicators 134
Introducing Quick Sight 135
Summary 137
Questions 138
5
AWS Services for Data Storing
Technical requirements 146
Storing data on Amazon S3 146
Creating buckets to hold data 149
Distinguishing between object tags
and object metadata 152
Controlling access to buckets
and objects on Amazon S3 153
S3 bucket policy 153
Protecting data on Amazon S3 156
Applying bucket versioning 156
Applying encryption to buckets 157
Securing S3 objects at rest
and in transit 162
Using other types of
data stores 164
Relational Database
Services (RDSes) 165
Managing failover in
Amazon RDS
Dealing with unbalanced
datasets 101
Dealing with text data 103
Bag of words 104
TF-IDF 107
Word embedding 108
Summary 112
Questions 113
4
Understanding and Visualizing Data
Visualizing relationships in
your data 124
Visualizing comparisons in
your data 126
Visualizing distributions in
your data 130
Visualizing compositions in
your data 133
Building key performance
indicators 134
Introducing Quick Sight 135
Summary 137
Questions 138
5
AWS Services for Data Storing
Technical requirements 146
Storing data on Amazon S3 146
Creating buckets to hold data 149
Distinguishing between object tags
and object metadata 152
Controlling access to buckets
and objects on Amazon S3 153
S3 bucket policy 153
Protecting data on Amazon S3 156
Applying bucket versioning 156
Applying encryption to buckets 157
Securing S3 objects at rest
and in transit 162
Using other types of
data stores 164
Relational Database
Services (RDSes) 165
Managing failover in
Amazon RDS
Loading page 9...
iv Table of Contents
6
AWS Services for Data Processing
Technical requirements 178
Creating ETL jobs on AWS Glue 178
Features of AWS Glue 179
Getting hands-on with AWS Glue data
catalog components 180
Getting hands-on with AWS Glue ETL
components 186
Querying S3 data using Athena 188
Processing real-time data using
Kinesis data streams 190
Storing and transforming
real-time data using Kinesis
Data Firehose 192
Different ways of ingesting
data from on-premises
into AWS 192
AWS Storage Gateway 193
Snowball, Snowball Edge, and
Snowmobile 194
AWS DataSync 195
Processing stored data
on AWS 195
AWS EMR 196
AWS Batch 197
Summary 198
Questions 198
Answers 201
Section 3:
Data Modeling
7
6
AWS Services for Data Processing
Technical requirements 178
Creating ETL jobs on AWS Glue 178
Features of AWS Glue 179
Getting hands-on with AWS Glue data
catalog components 180
Getting hands-on with AWS Glue ETL
components 186
Querying S3 data using Athena 188
Processing real-time data using
Kinesis data streams 190
Storing and transforming
real-time data using Kinesis
Data Firehose 192
Different ways of ingesting
data from on-premises
into AWS 192
AWS Storage Gateway 193
Snowball, Snowball Edge, and
Snowmobile 194
AWS DataSync 195
Processing stored data
on AWS 195
AWS EMR 196
AWS Batch 197
Summary 198
Questions 198
Answers 201
Section 3:
Data Modeling
7
Loading page 10...
Table of Contents v
Semantic segmentation algorithm 244
Object detection algorithm 245
Summary 245
Questions 247
8
Evaluating and Optimizing Models
Introducing model evaluation 254
Evaluating classification
models 255
Extracting metrics from
a confusion matrix 256
Summarizing precision and recall 259
Evaluating regression models 259
Exploring other regression metrics 261
Model optimization 261
Grid search 262
Summary 264
Questions 265
9
Amazon SageMaker Modeling
Technical requirements 272
Creating notebooks in
Amazon SageMaker 272
What is Amazon SageMaker? 272
Getting hands-on with Amazon
SageMaker notebook instances 276
Getting hands-on with Amazon
SageMaker's training and inference
instances 279
Semantic segmentation algorithm 244
Object detection algorithm 245
Summary 245
Questions 247
8
Evaluating and Optimizing Models
Introducing model evaluation 254
Evaluating classification
models 255
Extracting metrics from
a confusion matrix 256
Summarizing precision and recall 259
Evaluating regression models 259
Exploring other regression metrics 261
Model optimization 261
Grid search 262
Summary 264
Questions 265
9
Amazon SageMaker Modeling
Technical requirements 272
Creating notebooks in
Amazon SageMaker 272
What is Amazon SageMaker? 272
Getting hands-on with Amazon
SageMaker notebook instances 276
Getting hands-on with Amazon
SageMaker's training and inference
instances 279
Loading page 11...
Loading page 12...
Preface
The AWS Machine Learning Specialty certification exam tests your competency to
perform machine learning (ML) on AWS infrastructure. This book covers the entire
exam syllabus in depth using practical examples to help you with your real-world machine
learning projects on AWS.
Starting with an introduction to machine learning on AWS, you'll learn the fundamentals
of machine learning and explore important AWS services for artificial intelligence (AI).
You'll then see how to prepare data for machine learning and discover different techniques
for data manipulation and transformation for different types of variables. The book also
covers the handling of missing data and outliers and takes you through various machine
learning tasks such as classification, regression, clustering, forecasting, anomaly detection,
text mining, and image processing, along with their specific ML algorithms, that you
should know to pass the exam. Finally, you'll explore model evaluation, optimization, and
deployment and get to grips with deploying models in a production environment and
monitoring them.
By the end of the book, you'll have gained knowledge of all the key fields of machine
learning and the solutions that AWS has released for each of them, along with the tools,
methods, and techniques commonly used in each domain of AWS machine learning.
Who this book is for
This book is for professionals and students who want to take and pass the AWS Machine
Learning Specialty exam or gain a deeper knowledge of machine learning with a special
focus on AWS. Familiarity with the basics of machine learning and AWS services is
necessary.
What this book covers
Chapter 1, Machine Learning Fundamentals, covers some machine learning definitions,
different types of modeling approaches, and all the steps necessary to build a machine
learning product, known as the modeling pipeline.
The AWS Machine Learning Specialty certification exam tests your competency to
perform machine learning (ML) on AWS infrastructure. This book covers the entire
exam syllabus in depth using practical examples to help you with your real-world machine
learning projects on AWS.
Starting with an introduction to machine learning on AWS, you'll learn the fundamentals
of machine learning and explore important AWS services for artificial intelligence (AI).
You'll then see how to prepare data for machine learning and discover different techniques
for data manipulation and transformation for different types of variables. The book also
covers the handling of missing data and outliers and takes you through various machine
learning tasks such as classification, regression, clustering, forecasting, anomaly detection,
text mining, and image processing, along with their specific ML algorithms, that you
should know to pass the exam. Finally, you'll explore model evaluation, optimization, and
deployment and get to grips with deploying models in a production environment and
monitoring them.
By the end of the book, you'll have gained knowledge of all the key fields of machine
learning and the solutions that AWS has released for each of them, along with the tools,
methods, and techniques commonly used in each domain of AWS machine learning.
Who this book is for
This book is for professionals and students who want to take and pass the AWS Machine
Learning Specialty exam or gain a deeper knowledge of machine learning with a special
focus on AWS. Familiarity with the basics of machine learning and AWS services is
necessary.
What this book covers
Chapter 1, Machine Learning Fundamentals, covers some machine learning definitions,
different types of modeling approaches, and all the steps necessary to build a machine
learning product, known as the modeling pipeline.
Loading page 13...
viii Preface
Chapter 2, AWS Application Services for AI/ML, covers details of the various AI/ML
applications offered by AWS, which you should know to pass the exam.
Chapter 3, Data Preparation and Transformation, deals with categorical and numerical
features, applying different techniques to transform your data, such as one-hot encoding,
binary encoding, ordinal encoding, binning, and text transformations. You will also learn
how to handle missing values and outliers on your data, two important topics to build
good machine learning models.
Chapter 4, Understanding and Visualizing Data, teaches you how to select the most
appropriate data visualization technique according to different variable types and business
needs. You will also learn about available AWS services for visualizing data.
Chapter 5, AWS Services for Data Storing, teaches you about AWS services used to store
data for machine learning. You will learn about the many different S3 storage classes and
when to use each of them. You will also learn how to handle data encryption and how to
secure your data at rest and in transit. Finally, we will present other types of data store
services, still worth knowing for the exam.
Chapter 6, AWS Services for Processing, teaches you about AWS services used to process
data for machine learning. You will learn how to deal with batch and real-time processing,
how to directly query data on Amazon S3, and how to create big data applications on EMR.
Chapter 7, Applying Machine Learning Algorithms, covers different types of machine
learning tasks, such as classification, regression, clustering, forecasting, anomaly detection,
text mining, and image processing. Each of these tasks has specific algorithms that you
should know about to pass the exam. You will also learn how ensemble models work and
how to deal with the curse of dimensionality.
Chapter 8, Evaluating and Optimizing Models, teaches you how to select model metrics
to evaluate model results. You will also learn how to optimize your model by tuning
its hyperparameters.
Chapter 9, Amazon SageMaker Modeling, teaches you how to spin up notebooks to work
with exploratory data analysis and how to train your models on Amazon SageMaker. You
will learn where and how your training data should be stored in order to be accessible
Chapter 2, AWS Application Services for AI/ML, covers details of the various AI/ML
applications offered by AWS, which you should know to pass the exam.
Chapter 3, Data Preparation and Transformation, deals with categorical and numerical
features, applying different techniques to transform your data, such as one-hot encoding,
binary encoding, ordinal encoding, binning, and text transformations. You will also learn
how to handle missing values and outliers on your data, two important topics to build
good machine learning models.
Chapter 4, Understanding and Visualizing Data, teaches you how to select the most
appropriate data visualization technique according to different variable types and business
needs. You will also learn about available AWS services for visualizing data.
Chapter 5, AWS Services for Data Storing, teaches you about AWS services used to store
data for machine learning. You will learn about the many different S3 storage classes and
when to use each of them. You will also learn how to handle data encryption and how to
secure your data at rest and in transit. Finally, we will present other types of data store
services, still worth knowing for the exam.
Chapter 6, AWS Services for Processing, teaches you about AWS services used to process
data for machine learning. You will learn how to deal with batch and real-time processing,
how to directly query data on Amazon S3, and how to create big data applications on EMR.
Chapter 7, Applying Machine Learning Algorithms, covers different types of machine
learning tasks, such as classification, regression, clustering, forecasting, anomaly detection,
text mining, and image processing. Each of these tasks has specific algorithms that you
should know about to pass the exam. You will also learn how ensemble models work and
how to deal with the curse of dimensionality.
Chapter 8, Evaluating and Optimizing Models, teaches you how to select model metrics
to evaluate model results. You will also learn how to optimize your model by tuning
its hyperparameters.
Chapter 9, Amazon SageMaker Modeling, teaches you how to spin up notebooks to work
with exploratory data analysis and how to train your models on Amazon SageMaker. You
will learn where and how your training data should be stored in order to be accessible
Loading page 14...
Preface ix
To get the most out of this book
You will need a system with a good internet connection and an AWS account.
If you are using the digital version of this book, we advise you to type the code yourself
or access the code via the GitHub repository (link available in the next section). Doing
so will help you avoid any potential errors related to the copying and pasting of code.
Download the example code files
You can download the example code files for this book from GitHub at https://
github.com/PacktPublishing/AWS-Certified-Machine-Learning-
Specialty-MLS-C01-Certification-Guide. In case there's an update to the
code, it will be updated on the existing GitHub repository.
We also have other code bundles from our rich catalog of books and videos available at
https://github.com/PacktPublishing/. Check them out!
Download the color images
We also provide a PDF file that has color images of the screenshots/diagrams used
in this book. You can download it here: https://static.packt-cdn.com/
downloads/9781800569003_ColorImages.pdf.
Conventions used
There are a number of text conventions used throughout this book.
Code in text: Indicates code words in text, database table names, folder names,
filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles.
Here is an example: To check each of the versions and the latest one of them we use aws
s3api list-object-versions --bucket version-demo-mlpractice to which S3 provides the
list-object-versions API, as shown here."
A block of code is set as follows:
"Versions": [
{
"ETag":
"\"b6690f56ca22c410a2782512d24cdc97\"",
"Size": 10,
"StorageClass": "STANDARD",
To get the most out of this book
You will need a system with a good internet connection and an AWS account.
If you are using the digital version of this book, we advise you to type the code yourself
or access the code via the GitHub repository (link available in the next section). Doing
so will help you avoid any potential errors related to the copying and pasting of code.
Download the example code files
You can download the example code files for this book from GitHub at https://
github.com/PacktPublishing/AWS-Certified-Machine-Learning-
Specialty-MLS-C01-Certification-Guide. In case there's an update to the
code, it will be updated on the existing GitHub repository.
We also have other code bundles from our rich catalog of books and videos available at
https://github.com/PacktPublishing/. Check them out!
Download the color images
We also provide a PDF file that has color images of the screenshots/diagrams used
in this book. You can download it here: https://static.packt-cdn.com/
downloads/9781800569003_ColorImages.pdf.
Conventions used
There are a number of text conventions used throughout this book.
Code in text: Indicates code words in text, database table names, folder names,
filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles.
Here is an example: To check each of the versions and the latest one of them we use aws
s3api list-object-versions --bucket version-demo-mlpractice to which S3 provides the
list-object-versions API, as shown here."
A block of code is set as follows:
"Versions": [
{
"ETag":
"\"b6690f56ca22c410a2782512d24cdc97\"",
"Size": 10,
"StorageClass": "STANDARD",
Loading page 15...
Loading page 16...
13 more pages available. Scroll down to load them.
Preview Mode
Sign in to access the full document!
100%
Study Now!
XY-Copilot AI
Unlimited Access
Secure Payment
Instant Access
24/7 Support
Document Chat
Document Details
Subject
AWS Certification