Installing and managing big data stack (Hadoop ecosystem projects, Apache Spark, Kafka etc…) are time-consuming and tedious work. It requires a lot of configuration file change, node access management in the cluster, executes series of commands on each node in the cluster, some time you have to start service in a specific order and some specific service on only a few nodes.
When you are working on the small cluster it is easier to install and manages individual big data frameworks and tools manually or with custom made scripts. But in a real production environment when you are dealing with 100 and 1000 of machines then time and best practices are required to maintain, You have to choose standard tool or framework to automate installation and manage all Big data stacks.
Ambari comes up with the easiest management and monitoring solution for your HDP (Hortonworks data platform) cluster.
Using the Ambari Web UI and REST APIs, you can deploy, operate, manage configuration changes, and monitor services for all nodes in your cluster from a central point.
Ambari Installation steps
Note: You have to choose the correct version of operating system for Ambari installation otherwise you may run into issues.
Operating Systems Requirements
The following, 64-bit operating systems are supported:
Red Hat Enterprise Linux (RHEL) v7.x
Red Hat Enterprise Linux (RHEL) v6.x
Oracle Linux v7.x
Oracle Linux v6.x
SUSE Linux Enterprise Server (SLES) v11 SP4 (HDP 2.2 and later)
SUSE Linux Enterprise Server (SLES) v11 SP3
SUSE Linux Enterprise Server (SLES) v11 SP1 (HDP 2.2 and HDP 2.1)
Ubuntu Precise v12.04
Ubuntu Trusty v14.04
The following Java runtime environments are supported:
Oracle JDK 1.8 64-bit (minimum JDK 1.8_60) (default)
Oracle JDK 1.7 64-bit (minimum JDK 1.7_67)
OpenJDK 8 64-bit (not supported on SLES)
OpenJDK 7 64-bit (not supported on SLES)
Oracle JDK 1.8 64-bit installation on Ubuntu
sudo add-apt-repository ppa:webupd8team/java sudo apt-get update sudo apt-get install oracle-java8-installer
The Ambari host should have at least 1 GB RAM, with 500 MB free.
To check available memory on any host, run:
In general, the host you plan to run the Ambari Metrics Collector host should have the following memory and disk space available based on cluster size as maintained in Ambari official doc [*]:
|Number of hosts||Memory Available||Disk Space|
|1||1024 MB||10 GB|
|10||1024 MB||20 GB|
|50||2048 MB||50 GB|
|100||4096 MB||100 GB|
|300||4096 MB||100 GB|
|500||8096 MB||200 GB|
|1000||12288 MB||200 GB|
|2000||16384 MB||500 GB|
Package Size and Inode Count Requirements
*Size and Inode values are approximate
|Ambari Metrics Collector||225MB||4,000|
|Ambari Metrics Monitor||1MB||100|
|Ambari Metrics Hadoop Sink||8MB||100|
|After Ambari Server Setup||N/A||4,000|
|After Ambari Server Start||N/A||500|
|After Ambari Agent Start||N/A||200|
Check the Maximum Open File Descriptors
The recommended maximum number of open file descriptors is 10000, or more. To check the current value set for the maximum number of open file descriptors, execute the following shell commands on each host:
ulimit -Sn ulimit -Hn
If the output is not greater than 10000, run the following command to set it to a suitable default:
ulimit -n 10000
After registration, You need to move to the Google Cloud Console and select Compute engine for launching instances.
Select Ubuntu 14.04 as base Operating System.
Allow HTTP and HTTPS connection in VM instance, Ambari uses REST endpoints to manage cluster nodes.
For the accesing machine with SSH, you need to add you generated SSH key to Google cloud Metadata management dashboard.
copy your ~/.ssh/id_rsa.pub file and paste in SSH key section.
After that, you can access machine with your private key.
On a server host that has Internet access, use a command line editor to perform the following steps:
Log in to your host as root.
Download the Ambari repository file to a directory on your installation host.
wget -O /etc/apt/sources.list.d/ambari.list http://public-repo-1.hortonworks.com/ambari/ubuntu14/2.x/updates/126.96.36.199/ambari.list
apt-key adv --recv-keys --keyserver keyserver.ubuntu.com B9733A7A07513CAD
Confirm that Ambari packages downloaded successfully by checking the package name list.
apt-cache showpkg ambari-server apt-cache showpkg ambari-agent apt-cache showpkg ambari-metrics-assembly
You should see the Ambari packages in the list.
Install the Ambari bits. This also installs the default PostgreSQL Ambari database.
apt-get install ambari-server
Here you can use start with default configuration for the first time, Ambari needs to create a user account in setup process but you can skip it, later you can use manage it using Ambari Management dashboard.
Same with Database leave default PostgreSQL.
Run the following command on the Ambari Server host:
To check the Ambari Server processes:
To stop the Ambari Server:
Log In to Apache Ambari
Point your browser to http://server-ip:8080
Log into the Ambari Server using the default username/password: admin/admin. You can change these credentials later.
Give a name to cluster.
Select Version of HDP (hortonwork data platform).
Copy and paste it in the Host Registration Information section.
Ambari requires Fully Qualified Domain Name of target hosts.
Ambari will select nodes and start service on it from the order you provided in beginning, You can change it as per your need. or leave as it is.
You can customize how many DataNode, NodeManager, and Client you want. You can also manage these daemon process later using Ambari management dashboard.
Need to set a database root password for Hive, Oozie, Knox. Click on each service and set a password.
Review you final cluster details, You can take a printout for future reference.
After successful installation, Ambari will redirect to dashboard metric page. You can select individual services and check their status.
Ambari exposes web UI link of each and every service running on the cluster. But You need to set hostname mapping in you local system to access these links.
sudo vim /etc/hosts
add your hostname mapping,
188.8.131.52 testnode1.c.spiritual-vent-164721.internal 184.108.40.206 testnode2.c.spiritual-vent-164721.internal 220.127.116.11 testnode3.c.spiritual-vent-164721.internal
Check running services from console
Login into node where clients are running
ssh ~/.ssh/id_rsa @
switch user to hdfs.
sudo su hdfs
for the test, you can type hive, spark-shell on the console to check services are running properly or not.
you can list all running process on the node using jps command.
Note: If Java runtime not found jps command will ask you to install JRE. Just set JAVA_HOME in .bashrc file and source use bashrc file.