Using Apache Ranger to Manage GooseFS Access Permissions

Last updated: 2021-11-09 16:03:42

    Overview

    Apache Ranger is a standardized authentication component that manages access permissions across the big data ecosystem. GooseFS, as an acceleration storage system used for big data and data lakes, can be integrated into the comprehensive Apache Ranger authentication platform. This document describes how to use Apache Ranger to manage access permissions for GooseFS.

    Advantages

    • GooseFS is a cloud-native accelerated storage system that has supported Apache Range access permission management nearly in the same way as it supported HDFS. Therefore, HDFS big data users can easily migrate to GooseFS and reuse HDFS Ranger permission policies.
    • Compared with HDFS with Ranger, GooseFS with Ranger offers an authentication option of “Ranger + native ACL” that allows you to use the native ACL authentication when Ranger authentication fails, which solves the problem of imperfect Ranger authentication policy configurations.

    Framework of GooseFS with Ranger

    To integrate GooseFS into Ranger, we developed the GooseFS Ranger plugin that should be deployed on the GooseFS master node and Ranger Admin. The plugin does the following operations:

    • On the GooseFS master node:
      • Provides the Authorizer API to authenticate each metadata request on the GooseFS master node.
      • Connects to Ranger Admin to obtain user-configured authentication policies.
    • On the Ranger Admin:
      • Supports GooseFS resource lookup for Ranger Admin.
      • Verifies configurations.

    Deployment

    Preparations

    Before you start, ensure that you have deployed and configured Ranger components (including Ranger Admin and Ranger UserSync) in the environment and can open and use the Ranger web UI normally.

    Component Deployment

    Deploying GooseFS Ranger plugin to Ranger Admin and registering service

    Note:

    Click here to download the GooseFS Ranger plugin.

    Deploy as follows:

    1. Create a GooseFS directory in the Ranger service definition directory. Note that you should at least have execute and read permissions on the directory.

      1. If you use a Tencent Cloud EMR cluster, the Ranger service definition directory is /usr/local/service/ranger/ews/webapp/WEB-INF/classes/ranger-plugins.
      2. If you use a self-built Hadoop cluster, you can search for the path of Ranger-integrated components (such as HDFS) in Ranger to locate the path.
        Ranger service definition directory
    2. Put goosefs-ranger-plugin-${version}.jar and ranger-servicedef-goosefs.json in the GooseFS directory. Note that you should have read permission.

    3. Restart Ranger.

    4. In Ranger, run the following commands to register the GooseFS service:

      # Create the service. The Ranger admin account and password, as well as the Ranger service address should be specified.
      # For the Tencent Cloud EMR cluster, the admin is the root, and the password is the root account’s password that is set when the EMR cluster is created. The Ranger service IP is the master node IP of the EMR.
      adminUser=root
      adminPasswd=xxxx
      rangerServerAddr=10.0.0.1:6080
      curl -v -u${adminUser}:${adminPasswd} -X POST -H "Accept:application/json" -H "Content-Type:application/json" -d @./ranger-servicedef-goosefs.json http://${rangerServerAddr}/service/plugins/definitions
      # When the service is successfully registered, a service ID will be returned, which should be remembered.
      # To delete the GooseFS service, pass the service ID returned to run the following command:
      serviceId=104
      curl -v -u${adminUser}:${adminPasswd} -X DELETE -H "Accept:application/json" -H "Content-Type:application/json" http://${rangerServerAddr}/service/plugins/definitions/${serviceId}
      
    5. After the GooseFS service is created, you can view it in the Ranger console.
      Ranger Console

    6. Click + to define the GooseFS service instance.
      Defining GooseFS in the Ranger Console

    7. Click the created GooseFS instance and add a policy.
      Adding New Policy

    Deploying GooseFS Ranger plugin and enabling Ranger authentication

    1. Put goosefs-ranger-plugin-${version}.jar in the \${GOOSEFS_HOME}/lib directory. You should at least have read permission.

    2. Put ranger-goosefs-audit.xml, ranger-goosefs-security.xml, and ranger-policymgr-ssl.xml to the \${GOOSEFS_HOME}/conf directory and configure the required parameters as follows:

      • ranger-goosefs-security.xml:

        <configuration xmlns:xi="http://www.w3.org/2001/XInclude">
        <property>
        <name>ranger.plugin.goosefs.service.name</name>
        <value>goosefs</value>
        </property>
         <property>
        <name>ranger.plugin.goosefs.policy.source.impl</name>
        <value>org.apache.ranger.admin.client.RangerAdminRESTClient</value>
        </property>
         <property>
        <name>ranger.plugin.goosefs.policy.rest.url</name>
        <value>http://10.0.0.1:6080</value>
        </property>
         <property>
        <name>ranger.plugin.goosefs.policy.pollIntervalMs</name>
        <value>30000</value>
        </property>
         <property>
        <name>ranger.plugin.goosefs.policy.rest.client.connection.timeoutMs</name>
        <value>1200</value>
        </property>
         <property>
        <name>ranger.plugin.goosefs.policy.rest.client.read.timeoutMs</name>
        <value>30000</value>
        </property>
        </configuration>
        
      • ranger-goosefs-audit.xml (you can skip it if audit is disabled):

      • ranger-policymgr-ssl.xml

        <configuration>
        <property>
        <name>xasecure.policymgr.clientssl.keystore</name>
        <value>hadoopdev-clientcert.jks</value>
        </property>
         <property>
        <name>xasecure.policymgr.clientssl.truststore</name>
        <value>cacerts-xasecure.jks</value>
        </property>
         <property>
        <name>xasecure.policymgr.clientssl.keystore.credential.file</name>
        <value>jceks://file/tmp/keystore-hadoopdev-ssl.jceks</value>
        </property>
         <property>
        <name>xasecure.policymgr.clientssl.truststore.credential.file</name>
        <value>jceks://file/tmp/truststore-hadoopdev-ssl.jceks</value>
        </property>
        </configuration>
        
    3. Add the following configurations to goosefs-site.xml:

      ...
      goosefs.security.authorization.permission.type=CUSTOM
      goosefs.security.authorization.custom.provider.class=org.apache.ranger.authorization.goosefs.RangerGooseFSAuthorizer
      ...
      
    4. In \${GOOSEFS_HOME}/libexec/goosefs-config.sh, add goosefs-ranger-plugin-${version}.jar to the GooseFS class paths:

      ...
      GOOSEFS_RANGER_CLASSPATH="${GOOSEFS_HOME}/lib/ranger-goosefs-plugin-1.0.0-SNAPSHOT.jar"
      GOOSEFS_SERVER_CLASSPATH=${GOOSEFS_SERVER_CLASSPATH}:${GOOSEFS_RANGER_CLASSPATH}
      ...
      

    After these, the configuration is complete.

    Verification

    You can add a policy that allows Hadoop users to read and execute but not write to the GooseFS root directory as follows:

    1. Add a policy.
      Delivering the Policy in the Console
    2. Verify the policy to see if the policy takes effect.
      Verification