MR-1CP-ETAAMUSD: Advanced Methods in Data Science and Big Data Analytics

Overview

This course builds on skills developed in the Data Science and Big Data Analytics course. The main focus areas cover Hadoop (including Pig, Hive, and HBase), Natural Language Processing, Social Network Analysis, Simulation, Random Forests, Multinomial Logistic Regression, and Data Visualization. Taking an “Open” or technology-neutral approach, this course utilizes several open-source tools to address big data challenges.

Skills Covered

Upon successful completion of this course, participants should be able to:

Develop and execute MapReduce functionality
Gain familiarity with NoSQL databases and Hadoop Ecosystem tools for analyzing large-scale, unstructured data sets
Develop a working knowledge of Natural Language Processing, Social Network Analysis, and Data Visualization concepts
Use advanced quantitative methods and apply one of them in a Hadoop environment
Apply advanced techniques to real-world datasets in a final lab

Prerequisites

Completion of the Data Science and Big Data Analytics course
Proficiency in at least one programming language such as Java or Python

Target Audience

This course is intended for aspiring Data Scientists, data analysts that have completed the associate level Data Science and Big Data Analytics course, and computer scientists wanting to learn MapReduce and methods for analyzing unstructured data such as text.

Course Curriculum

Download PDF

Module 1: MapReduce and Hadoop

Lesson 1: The MapReduce Framework
Lesson 2: Apache Hadoop
Lesson 3: Hadoop Distributed File System
Lesson 4: YARN

Module 2: Hadoop Ecosystem and NoSQL

Lesson 1: Hadoop Ecosystem
Lesson 2: Pig
Lesson 3: Hive
Lesson 4: NoSQL – Not Only SQL
Lesson 5: HBase
Lesson 6: Spark

Module 3: Natural Language Processing

Lesson 1: Introduction to NLP
Lesson 2: Text Preprocessing
Lesson 3: TFIDF
Lesson 4: Beyond Bag of Words
Lesson 5: Language Modeling
Lesson 6: POS Tagging and HMM
Lesson 7: Sentiment Analysis and Topic Modeling

Module 4: Social Network Analysis

Lesson 1: Introduction to SNA and Graph Theory
Lesson 2: Most Important Nodes
Lesson 3: Communities and Small World
Lesson 4: Network Problems and SNA Tools

Module 5: Data Science Theory and Methods

Lesson 1: Simulation
Lesson 2: Random Forests
Lesson 3: Multinomial Logistic Regression

Module 6: Data Visualization

Lesson 1: Perception and Visualization
Lesson 2: Visualization of Multivariate Data Module

Show full curriculum

Dates & Locations

Let’s make it work for you

Can’t find a date that fits? Need to train your whole team? Looking for a discount?
Speak to one of our learning experts today.

Talk To Us

Exam & Certification

Dell Technologies Proven Professional advanced analytics specialist-level certification exam (E20-065).

Training & Certification Guide

Frequently Asked Questions

Speak to a Training Consultant

All courses are HRD Claimable.
Get in touch with our team via the form or WhatsApp us on +6011-5119 6631

Overview

Skills Covered

Prerequisites

Target Audience

Course Curriculum

Dates & Locations

Let’s make it work for you

Exam & Certification

Training & Certification Guide

Frequently Asked Questions

Speak to a Training Consultant

Explore Our Courses

Explore Tech Partners

Customer Service

Company

Trainocate: A Global Leader in Technology, Business, and People Development

Download Course Syllabus

Explore Tech Partners

Courses

Search for a course

Popular Courses

Popular Tech Articles