tencent cloud

Feedback

Configuring Capacity Scheduler

Last updated: 2023-12-27 14:32:16

    Overview

    Capacity Scheduler organizes resources in a hierarchical manner, allowing multiple users to share cluster resources based on multi-level resource restrictions.

    Directions

    Creating a resource pool

    1. Log in to the EMR console and click Details of the target Hadoop cluster in the cluster list to go to the cluster details page.
    2. On the cluster details page, select Cluster Service and choose Operation > Resource Scheduling in the top-right corner of the YARN component block to go to the Resource Scheduling page.
    
    
    3. Toggle on Resource scheduler and configure the scheduler.
    4. Create a resource pool for Capacity Scheduler. Select Capacity Scheduler and click Create Resource Pool to create a resource pool. You can edit or clone an existing resource pool as well as create a subpool for it. You can also click Default settings to set the number of times of delayed scheduling.
    
    
    
    Fields and parameters:
    Field Name
    Parameter
    Description
    Resource Pool Name
    yarn.scheduler.capacity.<queue-path>.queues</queue-path>
    Name of the resource pool or queue
    Label Settings
    N/A
    Set the specified label that can be accessed by the queue.
    Capacity
    yarn.scheduler.capacity.<queue-path>.capacity</queue-path>
    Available resource amount. The total capacity of the subpools of a parent pool is 100. Available resource amount = resource amount of the parent pool * percentage set here. The queue can consume more resources than the queue's capacity if there are idle resources in other queues.
    Max Capacity
    yarn.scheduler.capacity.<queue-path>.maximum-capacity</queue-path>
    Maximum queue capacity in percentage. Because of resource sharing, the amount of resources used by a queue may exceed its capacity, and this field specifies the maximum amount of resources that can be used by the queue.
    Default Label Expression
    yarn.scheduler.capacity.<queue-path>.default-node-label-expression</queue-path>
    If a resource request does not have a node label specified, the application will be submitted to the corresponding partition specified by this configuration item. By default, the value is empty, i.e., applications will be allocated to containers on nodes with no label.
    Min User Capacity
    yarn.scheduler.capacity.<queue-path>.minimum-user-limit-percent</queue-path>
    Minimum resources in percentage guaranteed for each user. Each queue enforces a limit on the percentage of resources allocated to a user at any given time. When multiple users' applications are running in a queue concurrently, the amount of resources used by each user varies between a minimum and maximum value. The minimum value depends on the number of running applications, and the maximum value is determined by `minimum-user-limit-percent`.
    User Resource Factor
    yarn.scheduler.capacity.<queue-path>.user-limit-factor</queue-path>
    Maximum amount of resources in percentage that can be used by each user. For example, if the value is `30`, the amount of resources for each user cannot exceed 30% of the queue capacity at any given time.
    Max Memory per Container
    yarn.scheduler.capacity.<queue-path>.maximum-allocation-mb</queue-path>
    Maximum memory that can be allocated to each container. The value will overwrite and cannot be greater than that of the system's `yarn.scheduler.maximum-allocation-mb`.
    Max vCores per Container
    yarn.scheduler.capacity.<queue-path>.maximum-allocation-vcores</queue-path>
    Maximum number of CPU cores that can be allocated to each container. The value will overwrite and cannot be greater than that of the system's `yarn.scheduler.maximum-allocation-vcores`.
    Resource Pool Status
    yarn.scheduler.capacity.<queue-path>.state</queue-path>
    Status of the queue. The value can be `Running` or `Stopped`. If a queue is in the `Stopped` status, new applications cannot be submitted to it or any of its subqueues.
    Max Apps
    yarn.scheduler.capacity.<queue-path>.maximum-applications</queue-path>
    Maximum number of concurrent active (both running and pending) applications allowed in the system
    Max Resources for AM
    yarn.scheduler.capacity.<queue-path>.maximum-am-resource-percent</queue-path>
    Maximum percentage of resources in the cluster which can be used to run application masters. It controls the number of concurrent active applications.
    Resource Pool Priority
    yarn.scheduler.capacity.root.<leaf-queue-path>.default-application-priority</leaf-queue-path>
    Configure the priority of the resource queue, which is `0` by default. The larger the value, the higher the priority.
    Submission
    yarn.scheduler.capacity.root.<queue-path>.acl_submit_applications
    List of users that can submit apps to the queue
    Management
    yarn.scheduler.capacity.root.<queue-path>.acl_administer_queue
    List of users that can manage the queue
    Delayed Scheduling
    yarn.scheduler.capacity.node-locality-delay
    Set the allowed number of times of delayed scheduling to ensure the local execution of tasks. If the value is `-1`, delayed scheduling will be disabled.

    Configuring resource pool mappings

    1. In the Policy Settings section, choose Resource Pool Mappings > Create Resource Pool Mapping to create a resource pool mapping.
    
    
    
    
    2. Configure Overwrite Specified Queues. This feature is disabled by default. For example, you have defined a mapped queue in Resource Pool Mappings and specified a queue other than the mapped queue when submitting a job; if the specified queue is default and Overwrite Specified Queues is enabled, the mapped queue will be used; otherwise, the specified queue will be used.

    Sample label-based scheduling

    1. Log in to the EMR console and click Details of the target Hadoop cluster in the cluster list to go to the cluster details page.
    2. On the cluster details page, select Cluster Service and choose Operation > Resource Scheduling in the top-right corner of the YARN component block to go to the Resource Scheduling page.
    3. Toggle on Resource scheduler and select Capacity Scheduler.
    4. Toggle on Label-based scheduling and click Label management to go to the label management** page.
    
    
    5. Click Create Label, enter the label name, and set the label type and node to be bound to as needed.
    
    
    6. After the label is set, click Apply. Then, you can view and edit the resource queue of the label in the resource pool.
    
    
    7. On the Resource Scheduling page, click Create Resource Pool and set Label, Capacity, Max Capacity, and other parameters as needed.
    Note
    The capacity and maximum capacity of a resource pool can be configured by label.
    
    8. After the resource pool is set, click Apply to submit the task to the backend.
    Caution
    Proceed with caution when restarting the ResourceManager. If you are prompted that the ResourceManager will be restarted after you click Apply, check whether the operation is successful in Scheduling History and whether the ResourceManager is healthy in Role Management.
    Contact Us

    Contact our sales team or business advisors to help your business.

    Technical Support

    Open a ticket if you're looking for further assistance. Our Ticket is 7x24 avaliable.

    7x24 Phone Support