Applications like Facebook, Google search, twitter, Amazon.com got millions of hit daily, these applications generate terabyte to petabyte bytes of data in a day, that needs to be processed, store and prepare a response for the user in sub seconds for better user experience. To handle these massive data, it requires specialized distributed system that can store and process these data efficiently.

An Organization can take physical dedicated machines and build a data center on top of it or can use cloud-based virtualized environment to create data center.

Data center are a physical or virtual infrastructure of an organization, which typically used for storing, processing and serving a large amount of data.

Data center requires specialized hardware, power supply, cooling technique and high network bandwidth. Data center are a complex set of systems that are interconnected with the network, interconnected systems are sharing resources among them. Data center require being fault tolerance, reliable, secure and energy efficient.

It’s hard to setup a dedicated datacenter for a smaller organization and its also not a cost effective, but the rise of cloud computing allows a large organization to share their datacenter resources over the web.

Cloud services like Amazon Web service, Google cloud, and Microsoft Azure allows other organizations to share their data center over the web.

DataCenter

Data center require a management software system for managing their resources. Researchers from different Academics and Organization like Google, Facebook, Amazon are finding different solutions. To run data center efficiently it’s required to have a better Job schedules, identifying unutilized resources and offer them to other jobs give maximum benefit to the data center.

Schedulers workload is directly proportional to the cluster’s size, in scalability of the Data center scheduler becoming the bottleneck.

There is various scheduling architecture available:

  • Monolithic
  • Statically partitioned schedulers
  • Two-level
  • Shared state

Monolithic Scheduler runs a single instance of scheduler code, that has global policy for all incoming Jobs.

Statically partitioned schedulers took total control over a set of resources, as they are typically deployed onto dedicated clusters.

Two-level scheduling, a centralize resource allocator dynamically partition cluster and resource are distributed to frameworks as a form of offers.

Shared-state scheduler, in this scheduler granted full access to the entire cluster. That removes the central resource allocation.

In upcoming posts, We will see how to design and deploy a data center over the public cloud and private cloud using Apache Mesos.

Posted by:Rahul Kumar

Rahul Kumar working as a Technical lead at Bangalore, India. He has more than 5 years of experience in distributed system design with Java, Scala, Akka toolkit & Play Framework. He developed various real-time data analytics applications using Apache Hadoop, Mesos ecosystem projects, and Apache Spark. He loves to design products around big data and with high velocity streaming data. He had given a couple of talks on Apache Spark, Reactive system and Actor Model in LinuxCon North America, Cassandra summit & Apache Bigdata Summits.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s