Dell - Advanced Methods in Data Science & Big Data Analytics


5 Days


Delivery Methods
Virtual Instructor Led
Private Group

Course Overview

This course builds on skills developed in the Data Science and Big Data Analytics course. The main focus areas cover Hadoop (including Pig, Hive, and HBase), Natural Language Processing, Social Network Analysis, Simulation, Random Forests, Multinomial Logistic Regression, and Data Visualization. Taking an ""Open"" or technology-neutral approach, this course utilizes several open-source tools to address big data challenges.

Course Objectives

  • Develop and execute MapReduce functionality
  • Gain familiarity with NoSQL databases and Hadoop Ecosystem tools for analyzing large-scale, unstructured data sets
  • Develop a working knowledge of Natural Language Processing, Social Network Analysis, and Data Visualization concepts
  • Use advanced quantitative methods and apply one of them in a Hadoop environment
  • Apply advanced techniques to real-world datasets in a final lab
  • Who Should Attend?

    This course is intended for aspiring Data Scientists, data analysts that have completed the associate level Data Science and Big Data Analytics course, and computer scientists wanting to learn MapReduce and methods for analyzing unstructured data such as text.

    United Training is committed to working as a partner with our clients. Choose United Training and take advantage of the following benefits.

    • Robust Public Enrollment Schedule. Enjoy access to hundreds of Guaranteed to Run dates across a diverse catalog of course titles.
    • Private Group Training. Let our world-class instructors come to you to deliver training at your place of business or we can present to your team online using our Virtual Instructor-Led Training platform.
    • Custom Training Solutions. Our subject matter experts can customize the class to specifically address the unique goals of your team.
    • Free Re-Takes. Most completed United Training courses carry our unbeatable Learning Guarantee. This guarantee allows students to repeat most United Training courses, if they are the same version, FREE OF CHARGE, within six months of completion of the courses. Exceptions: Cisco, Citrix, VMware, Red Hat, and courses provided by affiliated 3rd party training providers.

    Course Prerequisites

  • Completion of the Data Science and Big Data Analytics course
  • Proficiency in at least one programming language such as Java or Python
  • Agenda

    1 - MapReduce and Hadoop

    • The MapReduce Framework
    • ApacheHadoop
    • Hadoop Distributed File System
    • YARN

    2 - Hadoop Ecosystem and NoSQL

    • Hadoop Ecosystem
    • Pig
    • Hive
    • NoSQL-NotOnlySQL
    • HBase
    • Spark

    3 - Natural Language Processing

    • Introduction to NLP
    • TextPreprocessing
    • TFIDF
    • BeyondBagofWords
    • LanguageModeling
    • POS Tagging and HMM
    • Sentiment Analysis and Topic Modeling

    4 - Social Network Analysis

    • IntroductiontoSNAandGraphTheory
    • MostImportantNodes
    • Communities and Small World
    • Network Problems and SNA Tools

    5 - Data Science Theory and Methods

    • Simulation
    • RandomForests
    • MultinomialLogisticRegression

    6 - Data Visualization

    • Perception and Visualization
    • Visualization of Multivariate Data Module

    Get in touch to schedule training for your team
    We can enroll multiple students in an upcoming class or schedule a dedicated private training event designed to meet your organization’s needs.


    Do You Have Additional Questions? Please Contact Us Below.

    contact us contact us 
    Contact Us about Starting Your Business Training Strategy with United Training