Installing and managing big data stack (Hadoop ecosystem projects, Apache Spark, Kafka etc…) are time-consuming and tedious work.  It requires a lot of configuration file change, node access management in the cluster,  executes series of commands on each node in the cluster, some time you have to start service in a specific order and some specific service on only a few nodes.

When you are working on the small cluster it is easier to install and manages individual big data frameworks and tools manually or with custom made scripts. But in a real production environment when you are dealing with 100 and 1000 of machines then time and best practices are required to maintain, You have to choose standard tool or framework to automate installation and manage all Big data stacks.

Ambari comes up with the easiest management and monitoring solution for your HDP (Hortonworks data platform) cluster.

Using the Ambari Web UI and REST APIs, you can deploy, operate, manage configuration changes, and monitor services for all nodes in your cluster from a central point.

Ambari Installation steps

Note: You have to choose the correct version of operating system for Ambari installation otherwise you may run into issues.

Operating Systems Requirements

The following, 64-bit operating systems are supported:

Red Hat Enterprise Linux (RHEL) v7.x
Red Hat Enterprise Linux (RHEL) v6.x
CentOS v7.x
CentOS v6.x
Debian v7.x
Oracle Linux v7.x
Oracle Linux v6.x
SUSE Linux Enterprise Server (SLES) v11 SP4 (HDP 2.2 and later)
SUSE Linux Enterprise Server (SLES) v11 SP3
SUSE Linux Enterprise Server (SLES) v11 SP1 (HDP 2.2 and HDP 2.1)
Ubuntu Precise v12.04
Ubuntu Trusty v14.04

JDK Requirements

The following Java runtime environments are supported:

Oracle JDK 1.8 64-bit (minimum JDK 1.8_60) (default)
Oracle JDK 1.7 64-bit (minimum JDK 1.7_67)
OpenJDK 8 64-bit (not supported on SLES)
OpenJDK 7 64-bit (not supported on SLES)

Oracle JDK 1.8 64-bit installation on Ubuntu 
sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java8-installer

Memory Requirements

The Ambari host should have at least 1 GB RAM, with 500 MB free.

To check available memory on any host, run:

free -m

In general, the host you plan to run the Ambari Metrics Collector host should have the following memory and disk space available based on cluster size  as maintained in Ambari official doc [*]:

Number of hosts Memory Available Disk Space
1 1024 MB 10 GB
10 1024 MB 20 GB
50 2048 MB 50 GB
100 4096 MB 100 GB
300 4096 MB 100 GB
500 8096 MB 200 GB
1000 12288 MB 200 GB
2000 16384 MB 500 GB

Package Size and Inode Count Requirements

*Size and Inode values are approximate

Size Memory Available Inodes
Ambari Server 100MB 5,000
Ambari Agent 8MB 1,000
Ambari Metrics Collector 225MB 4,000
Ambari Metrics Monitor 1MB 100
Ambari Metrics Hadoop Sink 8MB 100
After Ambari Server Setup N/A 4,000
After Ambari Server Start N/A 500
After Ambari Agent Start N/A 200

Check the Maximum Open File Descriptors

The recommended maximum number of open file descriptors is 10000, or more. To check the current value set for the maximum number of open file descriptors, execute the following shell commands on each host:

ulimit -Sn
ulimit -Hn

If the output is not greater than 10000, run the following command to set it to a suitable default:

ulimit -n 10000

For Testing Amabari you can use Google Cloud platform, It offers $300 free credit at beginning of first registration.

After registration, You need to move to the Google Cloud Console and select Compute engine for launching instances.

Screenshot 2017-05-07 03.42.50

Select Ubuntu 14.04 as base Operating System.

Screenshot 2017-05-07 03.43.09

Allow HTTP and HTTPS connection in VM instance, Ambari uses REST endpoints to manage cluster nodes.

Screenshot 2017-05-07 03.43.46

For the accesing machine with SSH, you need to add you generated SSH key to Google cloud Metadata management dashboard.

copy your ~/.ssh/id_rsa.pub file and paste in SSH key section.

Screenshot 2017-05-07 03.46.45

Screenshot 2017-05-07 03.47.25

After that, you can access machine with your private key.

Screenshot 2017-05-07 03.59.45

On a server host that has Internet access, use a command line editor to perform the following steps:

Log in to your host as root.

sudo su

Download the Ambari repository file to a directory on your installation host.

wget -O /etc/apt/sources.list.d/ambari.list http://public-repo-1.hortonworks.com/ambari/ubuntu14/2.x/updates/2.5.0.3/ambari.list
apt-key adv --recv-keys --keyserver keyserver.ubuntu.com B9733A7A07513CAD
apt-get update

Confirm that Ambari packages downloaded successfully by checking the package name list.

apt-cache showpkg ambari-server
apt-cache showpkg ambari-agent
apt-cache showpkg ambari-metrics-assembly

You should see the Ambari packages in the list.

Install the Ambari bits. This also installs the default PostgreSQL Ambari database.

apt-get install ambari-server
ambari-server setup

Here you can use start with default configuration for the first time, Ambari needs to create a user account in setup process but you can skip it, later you can use manage it using  Ambari Management dashboard.

Same with Database leave default PostgreSQL.

Screenshot 2017-05-07 04.41.36

Run the following command on the Ambari Server host:

ambari-server start

To check the Ambari Server processes:

ambari-server status

To stop the Ambari Server:

ambari-server stop

Log In to Apache Ambari

Point your browser to http://server-ip:8080

Log into the Ambari Server using the default username/password: admin/admin. You can change these credentials later.

Screenshot 2017-05-07 04.48.49

Give a name to cluster.

Screenshot 2017-05-07 04.49.53

Select Version of HDP (hortonwork data platform).

Screenshot 2017-05-07 18.08.59

vim ~/.ssh/id_rsa

Copy and paste it in the Host Registration Information section.

Ambari requires Fully Qualified Domain Name of target hosts.

Screenshot 2017-05-07 05.34.36

Screenshot 2017-05-07 05.35.39

screencapture-104-197-30-115-8080-1494161204147

Ambari will select nodes and start service on it from the order you provided in beginning, You can change it as per your need. or leave as it is.

screencapture-104-197-30-115-8080-1494161394909

You can customize how many DataNode, NodeManager, and Client you want. You can also manage these daemon process later using Ambari management dashboard.

Screenshot 2017-05-07 05.51.32

Need to set a database root password for Hive, Oozie, Knox. Click on each service and set a password.

Screenshot 2017-05-07 18.26.47

Review you final cluster details, You can take a printout for future reference.

Screenshot 2017-05-07 18.32.54

Screenshot 2017-05-07 06.00.21

Screenshot 2017-05-07 06.13.16

Screenshot 2017-05-07 06.13.42

After successful installation, Ambari will redirect to dashboard metric page. You can select individual services and check their status.

screencapture-104-197-30-115-8080-1494163447403

Ambari exposes web UI link of each and every service running on the cluster. But You need to set hostname mapping in you local system to access these links. 

sudo vim /etc/hosts

add your hostname mapping,

example:

23.236.48.175 testnode1.c.spiritual-vent-164721.internal
104.198.150.146 testnode2.c.spiritual-vent-164721.internal
130.211.166.42 testnode3.c.spiritual-vent-164721.internal

Check running services from console

Login into node where clients are running 

ssh ~/.ssh/id_rsa @

switch user to hdfs.

sudo su hdfs

for the test, you can type hive, spark-shell on the console to check services are running properly or not.

you can list all running process on the node using jps command.

Note: If Java runtime not found jps command will ask you to install JRE. Just set JAVA_HOME in .bashrc file and source use bashrc file.

Posted by:Rahul Kumar

Rahul Kumar working as a Technical lead at Bangalore, India. He has more than 5 years of experience in distributed system design with Java, Scala, Akka toolkit & Play Framework. He developed various real-time data analytics applications using Apache Hadoop, Mesos ecosystem projects, and Apache Spark. He loves to design products around big data and with high velocity streaming data. He had given a couple of talks on Apache Spark, Reactive system and Actor Model in LinuxCon North America, Cassandra summit & Apache Bigdata Summits.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s