In the Previous blog post, we went through how to use Apache Ambari to manage Big data stacks.

Now we will see how to use Ambari as resource management dashboard and how to share a resource among running application stack on the cluster.

For an example, I will take YARN application and will run multiple application over it.

YARN’s Capacity Scheduler is designed to run Hadoop applications in a shared, multi-tenant cluster while maximizing the throughput and the utilization of the cluster.

For an organization, it economically benefits when different units share a resource among them.

The fundamental unit of scheduling in YARN is a queue.

The capacity of each queue specifies the percentage of cluster resources that are available for applications submitted to the queue.

A Queues can be set up in a hierarchy that reflects the database structure, resource requirements, and access restrictions required by the various organizations, groups, and users that utilize cluster resources.

  1. Go to Ambari Dashboard and click on YARN link, It will display all available resource on the cluster.
    1_YARN_UI
  2. click on the quick link and click the ResourceManager UI.
    2_ResourceManagerLink
  3. At the Resource manager dashboard click on Scheduler Link.
    3_ResourceManagerUI
  4. 4_DefaultQueue
  5.   5_DefaultQueueDetails
  6. 6_YARNQueueManagerLink
  7.  7_rootQueue
  8.  8_ChildDefaultQueue
  9.  9_ConfigureQueueCapacity
  10.  10_ActionafterQueueChange
  11.  11_ListOfNewQueue

CONFIGURING THE CAPACITY SCHEDULER

Specifying Which Version of Spark to Use

The default version for HDP 2.5.0 is Spark 1.6.2.

If more than one version of Spark is installed on a node, you can select which version of Spark runs your job.
To do this, set the SPARK_MAJOR_VERSION environment variable to the desired version before you launch the job.
For example, if Spark 1.6.2 and the Spark 2.1 technical preview are both installed on a node, and you want to run your job with Spark 2.1, set SPARK_MAJOR_VERSION to 2.1

export SPARK_MAJOR_VERSION=2.1

For testing spark 2.0 You can run spark pi example.

cd /usr/hdp/current/spark2-client/
./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster --num-executors 1 --driver-memory 512m --executor-memory 512m --executor-cores 1 examples/jars/spark-examples*.jar 10

You can check Yarn resource Manager UI in Ambari dashboard for running job status.

screencapture-104-197-30-115-8080-1494185388347

screencapture-testnode2-c-spiritual-vent-164721-internal-8088-cluster-1494185438582

Posted by:Rahul Kumar

Rahul Kumar working as a Technical lead at Bangalore, India. He has more than 5 years of experience in distributed system design with Java, Scala, Akka toolkit & Play Framework. He developed various real-time data analytics applications using Apache Hadoop, Mesos ecosystem projects, and Apache Spark. He loves to design products around big data and with high velocity streaming data. He had given a couple of talks on Apache Spark, Reactive system and Actor Model in LinuxCon North America, Cassandra summit & Apache Bigdata Summits.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s