Manage Big Data Resources and Applications with Hadoop YARN

Job scheduling and big data tracking are an integral part of Hadoop MapReduce and can be used to manage resources and applications. Early versions of Hadoop supported an initial system for tracking tasks and tasks, but as the mix of work supported by Hadoop changed, the scheduler couldn’t keep up.
In particular, the old scheduler could not manage functions other than MapReduce, and was not able to improve the use of the collection. So a new capability is designed to address these shortcomings and offer more flexibility, efficiency and performance.
However, the Other Resource Negotiator (YARN) is the primary Hadoop service that provides two main services:
Global Resource Manager (ResourceManager)
Management per application (ApplicationMaster)
ResourceManager is a master service and NodeManager controls each nodes of the Hadoop cluster. It is included in the ResourceManager, a scheduler, whose only job is to allocate system resources to specific running applications (tasks), but it does not monitor or track the status of the application.
All required system information is stored in the resource container. It contains the CPU, disk, network, and other important resource attributes necessary to run applications on the node and in the cluster.
Each node that has a NodeManager is subject to the global ResourceManager in the cluster. NodeManager monitors the application’s CPU, disk, network, and memory usage and returns to ResourceManager. For each application running on the node there is a corresponding ApplicationMaster.
If more resources are needed to support the running application, ApplicationMaster notifies NodeManager and NodeManager negotiates with ResourceManager (scheduler) for additional capacity on behalf of the application. NodeManager is also responsible for tracking job status and progress within a node.

Leave a Comment