tencent cloud

Feedback

CHDFS Ranger Permission System Solution

Last updated: 2022-03-30 09:30:26

    Background

    After you adopt storage-compute separation, you will host your data on CHDFS. CHDFS provides a permission system similar to that of native HDFS. In addition to HDFS permissions, Hadoop Ranger implements more refined permission control, including user group permission settings and permission settings for specific prefixes. Plus, as a one-stop permission system solution, Hadoop Ranger supports permission control for not only storage services but also components such as YARN and Hive. Therefore, to suit your use habits, we provide a CHDFS-Ranger connection solution to make it easy for you to control the permissions of CHDFS with Ranger.

    Strengths

    • Fine-grained permission control, which suits the use habits with Hadoop permissions.
    • Support for unified management of big data component and cloud-hosted storage permissions.

    Solution Architecture

    In the Hadoop permission system, authentication is provided by Kerberos and authorized by Ranger. In addition to this, we provide the following components to support the Ranger permission scheme for CHDFS.

    • CHDFS-Ranger-Plugin: it provides a service definition plugin on the Ranger server and description of the CHDFS service on the Ranger side. After this plugin is deployed, you can enter the corresponding permission policy on the Ranger control page.
    • COSRangerService: it integrates the Ranger client, periodically syncs permission policies from the Ranger server, and verifies the permission locally after receiving an authentication request. In addition, it also provides DelegationToken generation and renewal APIs in Hadoop, all of which are defined through Hadoop IPC.
    • CosRangerClient: it is dynamically loaded by the COSN plugin to forward permission verification requests to CosRangerService.

    Environment Deployment

    • Hadoop environment
    • ZooKeeper, Ranger, and Kerberos (if there are authentication requirements)
    Note:

    As the above services are mature open-source components, you can install them on your own.

    Component Deployment

    CHDFS-Ranger-Plugin extends the service types in the Ranger Admin console. You can set the operation permissions related to CHDFS in the Ranger console.

    Code address

    You can get the code from the ranger-plugin directory at GitHub.

    Version

    v1.2 or above

    Deployment steps

    1. Create a COS directory in the service definition directory of Ranger (note: make sure that the directory permissions include at least x and r permissions).
    2. For an EMR environment, the path is ranger/ews/webapp/WEB-INF/classes/ranger-plugins.
    3. For a self-built Hadoop environment, you can find the components connected to the Ranger service through find hdfs in the ranger directory in order to find the location of the ranger-plugins directory.
    4. Place cos-chdfs-ranger-plugin-xxx.jar in the CHDFS directory (note: the JAR package should at least have the r permission).
    5. Restart Ranger.
    6. Register the CHDFS service on Ranger. You can refer to the following command:
      ## To generate a service, you need to pass in the password of the Ranger admin account and the address of the Ranger service.
      ## For an EMR cluster, the admin user is root, and the password is the root password set when the EMR cluster is created. Replace the IP of the Ranger service with the master node IP of EMR.
      adminUser=root
      adminPasswd=xxxxxx
      rangerServerAddr=10.0.0.1:6080
      curl -v -u${adminUser}:${adminPasswd} -X POST -H "Accept:application/json" -H "Content-Type:application/json" -d @./chdfs-ranger.json http://${rangerServerAddr}/service/plugins/definitions
      ## To delete the service just defined, you need to pass in the service ID returned during creation.
      serviceId=102
      curl -v -u${adminUser}:${adminPasswd} -X DELETE -H "Accept:application/json" -H "Content-Type:application/json" http://${rangerServerAddr}/service/plugins/definitions/${serviceId}
    7. After the service is successfully created, you can see the CHDFS service in the Ranger console as shown below:
    8. Click + next to the CHDFS service to define a new service instance. The service instance name is customizable; for example, you can enter chdfs or chdfs_test. The service configuration is as shown below:

      You need to set the username subsequently used to start the COSRangerService service for policy.grantrevoke.auth.users. We generally recommend you set it to hadoop.
    9. Click the generated CHDFS service instance to add a policy as shown below:
    10. On the displayed page, configure the following parameters as detailed below:
    • MountPoint: mount point name in the format of f4mxxxxxx-yyyy. You can log in to the CHDFS console to view it.
    • Path: path of CHDFS, which must start with /.
      • include: indicates whether the set permission applies to the specified path itself or other paths except it.
      • recursive: indicates that the permission applies to not only the specified path but also the subpaths under it (i.e., recursive subpaths). It is usually used when the path is set as a directory.
    • Select Group/Select User: username and user group in logical OR relationship; that is, the operation is authorized as long as the username or user group condition is met.
    • Permissions:
      • Read: read operation, which corresponds to the GET and HEAD operations in COS, such as downloading objects and querying object metadata.
      • Write: write operation, which corresponds to the PUT operation in COS, such as uploading objects.
      • Delete: delete operation, which corresponds to the object deletion operation in COS. To rename a path in Hadoop, you need to have the deletion permission for the original path and write permission for the new path.
      • List: traversal permission, which corresponds to the List Object operation in COS.

    Verification

    1. Use hadoop cmd to perform operations related to accessing CHDFS as shown below:

      # Replace the mount point, path, and other parameters with your actual information.
      hadoop fs -ls ofs://f4mxxxxyyyy-zzzz.chdfs.ap-guangzhou.myqcloud.com/doc
      hadoop fs -put ./xxx.txt ofs://f4mxxxxyyyy-zzzz.chdfs.ap-guangzhou.myqcloud.com/doc/
      hadoop fs -get ofs://f4mxxxxyyyy-zzzz.chdfs.ap-guangzhou.myqcloud.com/exampleobject.txt
      hadoop fs -rm ofs://f4mxxxxyyyy-zzzz.chdfs.ap-guangzhou.myqcloud.com/exampleobject.txt
      
    2. Use an MR job to verify. Relevant services such as YARN and Hive must be restarted before verification.

    FAQs

    Do I have to install Kerberos?

    If users in the cluster are trustworthy, such as in a cluster that is only used internally, you can install Kerberos to support authentication. If the users only perform authentication operations, in order to avoid maloperations by unauthorized clients, you can choose not to install Kerberos and only use Ranger for authentication.
    Installing Kerberos will incur some performance loss. Please consider your own security and performance requirements. If authentication is required, after enabling Kerberos, you need to set the related configuration items of COSRangerService and COSRangerClient.

    If Ranger is enabled, but no policy is configured, or no policy is matched, what will happen?

    If no policy is matched, the operation will be denied by default.

    After Ranger is enabled, will CHDFS still perform POSIX authentication?

    Ranger authentication is performed in the client environment. Requests authenticated by Ranger will be sent to the CHDFS server, which will perform POSIX authentication by default. Therefore, if the permissions are controlled by Ranger, please disable the POSIX permission in the CHDFS console.

    Contact Us

    Contact our sales team or business advisors to help your business.

    Technical Support

    Open a ticket if you're looking for further assistance. Our Ticket is 7x24 avaliable.

    7x24 Phone Support