YARN Resource Scheduling

Last updated: 2021-09-28 15:50:28

    Feature Overview

    YARN Resource Scheduling supports interactive YARN resource queue scheduling management, which is more convenient than file-based configuration management. Currently, it supports two types of scheduling configurations: Fair Scheduler and Capacity Scheduler.

    • Fair Scheduler allocates resources fairly to each job on YARN based on weight.
    • Capacity Scheduler organizes resources in a hierarchical manner, allowing multiple users to share cluster resources based on multi-level resource restrictions.
    Note:

    • Fair Scheduler is used by default. Therefore, you need to configure relevant parameters in the fair-scheduler.xml configuration file for the YARN component. If >you switch to Capacity Scheduler, configure relevant parameters in the capacity-scheduler.xml configuration file. No matter which scheduler you use, parameter >configurations must be consistent with those on the Resource Scheduling page.
    • After setting the policy on the Resource Scheduling page, you need to click Refresh Dynamic Resource Pools to deliver the policy configurations to keep >the configuration file and parameters consistent in Configuration Management. After deleting a resource pool, you need to manually restart or click Apply >to restart Resource Manager.
    • After switching schedulers, you need to click Apply to restart Resource Manager for the changes to take effect.

    Configuring Fair Scheduler

    1. Log in to the EMR console and click a Hadoop cluster ID on the cluster list to go to the cluster details page.
    2. On the cluster details page, click Cluster Service > Operation on the YARN component card > Resource Scheduling to go to the resource scheduling page.
    3. Turn on Resource Scheduler, then you can configure the scheduler.
    4. Create a resource pool for Fair Scheduler.
      Select Fair Scheduler and click Create Resource Pool to create a resource pool. You can also edit, clone, delete an existing resource pool as well as create a subpool for it.

    Fields and parameters

    Field Parameter Description
    Resource Pool Name `name` Name of the resource pool or queue
    Parent Pool The value of `type` is `parent`. Means that although the resource pool has no subpools, it is not a leaf node. A parent pool cannot have subpools in Hadoop v2.8 and later.
    Configuration Set None YARN does not have this parameter, which means a collection of scheduled tasks.
    Weight `weight` Percentage of resources in the parent pool. A greater weight means more resources allocated.
    Min Resources minResources The minimum amount of resources guaranteed. When the minimum amount of resources guaranteed for a queue is not met, it has a higher priority than other queues at the same level to obtain resources.
    Max Resources maxResources The maximum amount of resources that can be used. The amount of resources available for each queue cannot exceed this value.
    Max Concurrent Running Apps maxRunningApps The maximum number of concurrent running applications allowed. This limitation can prevent intermediate outputs generated by excessive concurrent running map tasks from filling up the disks.
    Max Share for App Master maxAMShare The maximum percentage of resources that can be used to run Application Master. This attribute only applies to leaf queues.
    Scheduling Policy schedulingPolicy You can set a scheduling policy for any queue. Valid values include `Fifo`, `Fair`, and `Drf`. If the value is `Fifo` or `Fair`, only memory is taken into account in resource allocation. If `Drf`, both memory and the number of cores are taken into account.
    Preemption Mode allowPreemptionFrom This field applies only to Hadoop v3.x and later. In v2.x, you can only configure `yarn.scheduler.fair.preemption` to set the preemption mode.
    Fair Share Preemption Threshold fairSharePreemptionThreshold The fair share preemption threshold for the queue. If the queue waits `fairSharePreemptionTimeout` without receiving `fairSharePreemptionThreshold*fairShare` resources, it is allowed to preempt resources from other queues. If this field is not set, the queue will inherit the value from its parent queue.
    Fair Share Preemption Timeout fairSharePreemptionTimeout Number of seconds the queue is under its fair share threshold before it will try to preempt resources from other queues. If this field is not set, the queue will inherit the value from its parent queue.
    Min Share Preemption Timeout minSharePreemptionTimeout Number of seconds the queue is under its minimum share before it will try to preempt resources from other queues. If this field is not set, the queue will inherit the value from its parent queue.
    Submission aclSubmitApps List of users that can submit apps to the queue
    Management aclAdministerApps List of users that can manage the queue
    1. Configure execution plans.
      In the Policy Settings section, click Execution Plans > Create Execution Plan to create an execution plan.

    2. Configure placement rules.
      In the Policy Settings section, click Placement Rules > Create Placement Rule to create a placement rule.
    3. Configure user limits.
      In the Policy Settings section, click User Limits > Create User Limit to create a user limit.

    Configuring Capacity Scheduler

    1. Log in to the EMR console and click a Hadoop cluster ID on the cluster list to go to the cluster details page.
    2. On the cluster details page, click Cluster Service > Operation on the YARN component card > Resource Scheduling to go to the resource scheduling page.
    3. Turn on Resource Scheduler, then you can configure the scheduler.
    4. Create a resource pool for Capacity Scheduler.
      Select Capacity Scheduler and click Create Resource Pool to create a resource pool. You can also edit, clone, delete an existing resource pool as well as create a subpool for it.
      )

    Fields and parameters

    Field Parameter Description
    Resource Pool Name yarn.scheduler.capacity.&dxlt;queue-path&dxgt;.queues Name of the resource pool or queue
    Capacity yarn.scheduler.capacity.&dxlt;queue-path&dxgt;.capacity Percentage of the queue's capacity in the cluster. The sum of capacities for all queues must be equal to 100%. However, the queue can consume more resources than the queue's capacity if there are free resources in other queues.
    Max Capacity yarn.scheduler.capacity.&dxlt;queue-path&dxgt;.maximum-capacity Maximum queue capacity in percentage. Because of resource sharing, the amount of resources used by a queue may exceed its capacity, and this field specifies the maximum amount of resources that can be used by the queue.
    Min User Capacity yarn.scheduler.capacity.&dxlt;queue-path&dxgt;.minimum-user-limit-percent Minimum resources in percentage guaranteed for each user. Each queue enforces a limit on the percentage of resources allocated to a user at any given time. When multiple users' applications are running in a queue concurrently, the amount of resources used by each user varies between a minimum and maximum value. The minimum value depends on the number of running applications, and the maximum value is determined by `minimum-user-limit-percent`.
    User Resource Factor yarn.scheduler.capacity.&dxlt;queue-path&dxgt;.user-limit-factor Maximum amount of resources in percentage that can be used by each user. For example, if the value is `30`, the amount of resources for each user cannot exceed 30% of the queue capacity at any given time.
    Max Memory per Container yarn.scheduler.capacity.&dxlt;queue-path&dxgt;.maximum-allocation-mb Maximum memory that can be allocated to each container. The value will overwrite and cannot be greater than that of the system's `yarn.scheduler.maximum-allocation-mb`.
    Resource Pool Status yarn.scheduler.capacity.&dxlt;queue-path&dxgt;.state Status of the queue. The value can be `Running` or `Stopped`. If a queue is in the `Stopped` status, new applications cannot be submitted to it or any of its subqueues.
    Max Apps yarn.scheduler.capacity.&dxlt;queue-path&dxgt;.maximum-applications Maximum number of concurrent active (both running and pending) applications allowed in the system
    Max Resources for AM yarn.scheduler.capacity.&dxlt;queue-path&dxgt;.maximum-am-resource-percent Maximum percentage of resources in the cluster which can be used to run application masters. It controls the number of concurrent active applications.
    Submission yarn.scheduler.capacity.root.&dxlt;queue-path&dxgt;.acl_submit_applications List of users that can submit apps to the queue
    Management yarn.scheduler.capacity.root.&dxlt;queue-path&dxgt;.acl_administer_queue List of users that can manage the queue
    1. Configure resource pool mappings.
      In the Policy Settings section, click Resource Pool Mappings > Create Resource Pool Mapping to create a resource pool mapping.

    2. Configure Overwrite Specified Queues.
      This feature is disabled by default. For example, you have defined a mapped queue in Resource Pool Mappings and specified a queue other than the mapped queue when submitting a task; if the specified queue is default and Overwrite Specified Queues is enabled, the mapped queue will be used; otherwise, the specified queue will be used.