An Initiative of IIM Alumni
An Initiative of IIM Alumni

Hadoop Administration

Course Fee: 5000
Course Duration: 2 months

Hadoop Administration Course Structure

1.       Introduction

2.       Big Data


a.       What is Big Data


b.      Domains of Big Data


c.       V’s of Big Data


d.      Challenges of Big Data


e.      Technologies supporting Big Data

3.       Apache Hadoop


a.       Why Hadoop


b.      Features of Hadoop


c.       History of Hadoop


d.      Who is using Hadoop


e.      Fundamental Concepts


f.        Core Hadoop Components and Daemons

4.       Hard disk Architecture


a.       Architecture insights


b.      Read and Write process


c.       Distributed Computing

5.       Storage: Hadoop Distributed File system (HDFS)


a.       What is HDFS


b.      HDFS Features


c.       HDFS Flow Architecture


d.      HDFS concept – Blocks, Namenode, Datanode


e.      Anatomy of Writing and Reading Files


f.        Web UIs for HDFS


g.       Insights of HDFS 1x and 2x


h.      HDFS High Availability


i.         HDFS Internals


j.        Block Cache


k.       CLI Commands


l.         Administrator Command on HDFS

6.       Apache Zookeeper


a.       Overview of Cap Theorem


b.      Why Zookeeper in Hadoop


c.       Insight of Zookeeper Architecture


d.      Zookeeper integration with Hadoop

7.       Processing: MapReduce


a.       Overview of MapReduce


b.      Insights of Map Phase and Reduce Phase


c.       MapReduce Program (Job) execution


d.      Configuration on MapReduce


e.      Failures and Exceptions


f.        Performance tuning of MapReduce jobs


g.       Counters techniques

8.       Processing: YARN


a.       What is YARN


b.      YARN Architecture


c.       Application execution flow in YARN


d.      Configuration of YARN


e.      High Availability of Resource Manager


f.        Distributed Cache


g.       Performance turning

9.       Hadoop Distribution


a.       What are Hadoop Distributions


b.      Difference between Hortonworks and Cloudera


c.       Cluster setup using Hortonworks and Cloudera distributions

10.   Apache Sqoop


a.       Introduction to Sqoop concepts


b.      Sqoop internal design and architecture


c.       Sqoop Import statements concepts


d.      Sqoop Export Statements concepts


e.      Incremental updating concepts


f.        Sqoop metastore

11.   Apache Pig


a.       Introduction to Pig concepts


b.      Architecture of Apache Pig


c.       Installation of Apache Pig


d.      Apache Pig modes of execution and storage concepts


e.      Apache Pig program logics explanation


f.        Apache Pig commands


g.       Apache Pig script execution in Grunt shell.

12.   Apache Hive


a.       Overview of Hive concepts


b.      Insights of Hive architecture


c.       Installations of Apache Hive


d.      Types of Hive tables flow


e.      Metastore in Hive


f.        Partitions and Buckets in Hive


g.       Hive commands for administration

13.   Apache HBase


a.       Introduction to HBase concepts


b.      HBase design and architectural insights


c.       HBase installation


d.      HBase table commands


e.      Hive and HBase integration


f.        HBase commands for administration

14.   Apache Flume


a.       Introduction to Flume & features


b.      Flume topology & core concepts


c.       Property file parameters logic


d.      Flume installation


e.      Flume commands for administration

15.   Oozie


a.       Introduction to Oozie


b.      Oozie installation


c.       Write Oozie schedulers


d.      Deploy and Run Oozie Schedulers

16.   Security in Hadoop Cluster


a.       Kerberos


    i.      Overview of Kerberos


  ii.      Kerberos Architecture


 iii.      Installation of Kerberos in Hadoop Cluster


iv.      Commands for Administration


b.      Knox


    i.      Overview of Knox


  ii.      Installation of Knox gateway


 iii.      Knox commands for Administration


c.       Rangers


    i.      Introduction of Rangers


  ii.      Architecture of Rangers


 iii.      Installation of Rangers


d.      ACLs


    i.      Overview of ACLs


  ii.      Implementation of ACLs in cluster

17.   Apache Spark


a.       Introduction to Spark


b.      Architecture of Spark


c.       Installation of Spark


d.      Troubleshooting Failures and Exceptions


e.      Performance turning in Spark


f.        Spark Administration commands

18.   Apache Kafka


a.       Introduction to Kafka & features


b.      Kafka installation


c.       Kafka operations


d.      Administration commands

19.   Managing and Scheduling Jobs


a.       Managing Running Jobs


b.      FIFO Scheduler


c.       Fair Scheduler


d.      Capacity Scheduler


e.      Queue configuration

20.   Planning your Hadoop Cluster


a.       Pre-requisites for planning a cluster


b.      Calculations of DFS, RAM and Cores


c.       Designing the cluster blue print

21.   Cluster Maintenance


a.       Checks on Hadoop cluster


b.      Copying Data Between Clusters


c.       Trash


d.      Data Backup and recovery


e.      Commissioning and Decommissioning Cluster Nodes


f.        Rebalancing the Cluster


g.       NameNode Metadata Backup


h.      Cluster Upgrading

22.   Cluster Monitoring and Troubleshooting


a.       General System Monitoring


b.      Managing Hadoop Log Files


c.       Using the NameNode and JobTracker Web UIs


d.      Cluster Monitoring with Ganglia


e.      Common Troubleshooting Issues


f.        Cluster Benchmarking

23.   Soft Skills


a.       Tell me about your self


b.      Interview Q and A


c.       Resume preparation


d.      Project explanation


Batch Start:

Class Time:
05:00 PM


Batch Start:

Class Time:
05:00 PM



Student Reviews

X Xxxxxxx
X Xxxxxxx
X Xxxxxxx

Other Courses

Get Connected. Follow our Exclusive content.


Get Subscribed. Receive new Updates on latest Courses, Trending Courses, News and Alerts.