Applications like Facebook, Google search, twitter, Amazon.com got millions of hit daily, these applications generate terabyte to petabyte bytes of data in a day, that needs to be processed, store and prepare a response for the user in sub seconds for better user experience. To handle these massive data, it requires specialized distributed system that can store and process these data efficiently.
An Organization can take physical dedicated machines and build a data center on top of it or can use cloud-based virtualized environment to create data center.
Data center are a physical or virtual infrastructure of an organization, which typically used for storing, processing and serving a large amount of data.
Data center requires specialized hardware, power supply, cooling technique and high network bandwidth. Data center are a complex set of systems that are interconnected with the network, interconnected systems are sharing resources among them. Data center require being fault tolerance, reliable, secure and energy efficient.
It’s hard to setup a dedicated datacenter for a smaller organization and its also not a cost effective, but the rise of cloud computing allows a large organization to share their datacenter resources over the web.
Cloud services like Amazon Web service, Google cloud, and Microsoft Azure allows other organizations to share their data center over the web.
Data center require a management software system for managing their resources. Researchers from different Academics and Organization like Google, Facebook, Amazon are finding different solutions. To run data center efficiently it’s required to have a better Job schedules, identifying unutilized resources and offer them to other jobs give maximum benefit to the data center.
Schedulers workload is directly proportional to the cluster’s size, in scalability of the Data center scheduler becoming the bottleneck.
There is various scheduling architecture available:
- Statically partitioned schedulers
- Shared state
Monolithic Scheduler runs a single instance of scheduler code, that has global policy for all incoming Jobs.
Statically partitioned schedulers took total control over a set of resources, as they are typically deployed onto dedicated clusters.
Two-level scheduling, a centralize resource allocator dynamically partition cluster and resource are distributed to frameworks as a form of offers.
Shared-state scheduler, in this scheduler granted full access to the entire cluster. That removes the central resource allocation.
In upcoming posts, We will see how to design and deploy a data center over the public cloud and private cloud using Apache Mesos.