Big Data and Hadoop

Course Features

Course Details

Chapter 1: Understanding Big Data and Hadoop
Big Data
Limitations and Solutions of existing Data Analytics Architecture
Hadoop Features
Hadoop Ecosystem
Hadoop 2.x core components
Hadoop Storage: HDFS
Hadoop Processing: MapReduce Framework
Hadoop Different Distributions
Chapter 2:Hadoop Architecture and HDFS 
Hadoop 2.x Cluster Architecture - Federation and High Availability
A Typical Production Hadoop Cluster
Hadoop Cluster Modes
Common Hadoop Shell Commands
Hadoop 2.x Configuration Files
Single node cluster and Multi node cluster set up Hadoop Administration
Chapter 3: Hadoop MapReduce Framework
MapReduce Use Cases
Traditional way Vs MapReduce way
Why MapReduce
Hadoop 2.x MapReduce Architecture
Hadoop 2.x MapReduce Components
YARN MR Application Execution Flow
YARN Workflow
Anatomy of MapReduce Program
Demo on MapReduce
Input Splits
Relation between Input Splits and HDFS Blocks
MapReduce: Combiner & Partitioner
Demo on de-identifying Health Care Data set
Demo on Weather Data set
Chapter 4: Advanced MapReduce
Distributed Cache
Reduce Join
Custom Input Format
Sequence Input Format
Xml file Parsing using MapReduce
Chapter 5: Pig
About Pig
MapReduce Vs Pig
Pig Use Cases
Programming Structure in Pig
Pig Running Modes
Pig components
Pig Execution
Pig Latin Program
Data Models in Pig
Pig Data Types
Shell and Utility Commands
Pig Latin : Relational Operators
File Loaders, Group Operator
COGROUP Operator
Joins and COGROUP
Diagnostic Operators
Specialized joins in Pig
Built In Functions ( Eval Function, Load and Store Functions, Math function, String Function, Date Function, Pig UDF, Piggybank, Parameter Substitution ( PIG macros and Pig Parameter substitution )
Pig Streaming Testing Pig scripts with Punit
Aviation use case in PIG
Pig Demo on Healthcare Data set
Chapter 6:Hive
Hive Background
Hive Use Case
About Hive
Hive Vs Pig
Hive Architecture and Components
Metastore in Hive
Limitations of Hive
Comparison with Traditional Database
Hive Data Types and Data Models
Partitions and Buckets
Hive Tables(Managed Tables and External Tables)
Importing Data
Querying Data
Managing Output
Hive Script
Hive UDF
Retail use case in Hive
Hive Demo on Healthcare Data set
Chapter 7:Advanced Hive and HBase
Hive QL: Joining Tables
Dynamic Partitioning
Custom Map/Reduce Scripts
Hive Indexes and views Hive query optimizers
Hive : Thrift Server, User Defined Functions
HBase: Introduction to NoSQL Databases and HBase
HBase v/s RDBMS
HBase Components
HBase Architecture
Run Modes & Configuration
HBase Cluster Deployment
Chapter 8:Advanced HBase
HBase Data Model
HBase Shell
HBase Client API
Data Loading Techniques
ZooKeeper Data Model
Zookeeper Service
Demos on Bulk Loading
Getting and Inserting Data
Filters in HBase
Chapter 9:Processing Distributed Data with Apache Spark
What is Apache Spark
Spark Ecosystem
Spark Components
History of Spark and Spark Versions/Releases
Spark a Polyglot
What is Scala?
Why Scala?
Chapter 10:Oozie and Hadoop Project
Flume and Sqoop Demo
Oozie Components
Oozie Workflow
Scheduling with Oozie
Demo on Oozie Workflow
Oozie Co-ordinator
Oozie Commands
Oozie Web Console
Oozie for MapReduce
Hive, and Sqoop
Combine flow of MR
Hive in Oozie
Hadoop Project Demo
Hadoop Integration with Talend
This course does not have any sections.

More Courses by this Instructor