Apache Ranger is a standardized authentication component that manages access permissions across the big data ecosystem. GooseFS, as an acceleration storage system used for big data and data lakes, can be integrated into the comprehensive Apache Ranger authentication platform. This document describes how to use Apache Ranger to manage access permissions for GooseFS.
To integrate GooseFS into Ranger, we developed the GooseFS Ranger plugin that should be deployed on the GooseFS master node and Ranger Admin. The plugin does the following operations:
Authorizer
API to authenticate each metadata request on the GooseFS master node.Before you start, ensure that you have deployed and configured Ranger components (including Ranger Admin and Ranger UserSync) in the environment and can open and use the Ranger web UI normally.
Note:Click here to download the GooseFS Ranger plugin.
Deploy as follows:
Create a GooseFS directory in the Ranger service definition directory. Note that you should at least have execute and read permissions on the directory.
If you use a Tencent Cloud EMR cluster, the Ranger service definition directory is /usr/local/service/ranger/ews/webapp/WEB-INF/classes/ranger-plugins
.
If you use a self-built Hadoop cluster, you can search for the path of Ranger-integrated components (such as HDFS) in Ranger to locate the path.
Put goosefs-ranger-plugin-${version}.jar
and ranger-servicedef-goosefs.json
in the GooseFS directory. Note that you should have read permission.
Restart Ranger.
In Ranger, run the following commands to register the GooseFS service:
# Create the service. The Ranger admin account and password, as well as the Ranger service address should be specified.
# For the Tencent Cloud EMR cluster, the admin is the root, and the password is the root account’s password that is set when the EMR cluster is created. The Ranger service IP is the master node IP of the EMR.
adminUser=root
adminPasswd=xxxx
rangerServerAddr=10.0.0.1:6080
curl -v -u${adminUser}:${adminPasswd} -X POST -H "Accept:application/json" -H "Content-Type:application/json" -d @./ranger-servicedef-goosefs.json http://${rangerServerAddr}/service/plugins/definitions
# When the service is successfully registered, a service ID will be returned, which should be remembered.
# To delete the GooseFS service, pass the service ID returned to run the following command:
serviceId=104
curl -v -u${adminUser}:${adminPasswd} -X DELETE -H "Accept:application/json" -H "Content-Type:application/json" http://${rangerServerAddr}/service/plugins/definitions/${serviceId}
After the GooseFS service is created, you can view it in the Ranger console.
Click + to define the GooseFS service instance.
Click the created GooseFS instance and add a policy.
goosefs-ranger-plugin-${version}.jar
in the \${GOOSEFS_HOME}/lib
directory. You should at least have read permission.ranger-goosefs-audit.xml
, ranger-goosefs-security.xml
, and ranger-policymgr-ssl.xml
to the \${GOOSEFS_HOME}/conf
directory and configure the required parameters as follows:ranger-goosefs-security.xml
:
<configuration xmlns:xi="http://www.w3.org/2001/XInclude">
<property>
<name>ranger.plugin.goosefs.service.name</name>
<value>goosefs</value>
</property>
<property>
<name>ranger.plugin.goosefs.policy.source.impl</name>
<value>org.apache.ranger.admin.client.RangerAdminRESTClient</value>
</property>
<property>
<name>ranger.plugin.goosefs.policy.rest.url</name>
<value>http://10.0.0.1:6080</value>
</property>
<property>
<name>ranger.plugin.goosefs.policy.pollIntervalMs</name>
<value>30000</value>
</property>
<property>
<name>ranger.plugin.goosefs.policy.rest.client.connection.timeoutMs</name>
<value>1200</value>
</property>
<property>
<name>ranger.plugin.goosefs.policy.rest.client.read.timeoutMs</name>
<value>30000</value>
</property>
</configuration>
ranger-goosefs-audit.xml
(you can skip it if audit is disabled):
<configuration>
<property>
<name>xasecure.audit.is.enabled</name>
<value>false</value>
</property>
<property>
<name>xasecure.audit.db.is.async</name>
<value>true</value>
</property>
<property>
<name>xasecure.audit.db.async.max.queue.size</name>
<value>10240</value>
</property>
<property>
<name>xasecure.audit.db.async.max.flush.interval.ms</name>
<value>30000</value>
</property>
<property>
<name>xasecure.audit.db.batch.size</name>
<value>100</value>
</property>
<property>
<name>xasecure.audit.jpa.javax.persistence.jdbc.url</name>
<value>jdbc:mysql://localhost:3306/ranger_audit</value>
</property>
<property>
<name>xasecure.audit.jpa.javax.persistence.jdbc.user</name>
<value>rangerLogger</value>
</property>
<property>
<name>xasecure.audit.jpa.javax.persistence.jdbc.password</name>
<value>none</value>
</property>
<property>
<name>xasecure.audit.jpa.javax.persistence.jdbc.driver</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>xasecure.audit.credential.provider.file</name>
<value>jceks://file/etc/ranger/hadoopdev/auditcred.jceks</value>
</property>
<property>
<name>xasecure.audit.hdfs.is.enabled</name>
<value>true</value>
</property>
<property>
<name>xasecure.audit.hdfs.is.async</name>
<value>true</value>
</property>
<property>
<name>xasecure.audit.hdfs.async.max.queue.size</name>
<value>1048576</value>
</property>
<property>
<name>xasecure.audit.hdfs.async.max.flush.interval.ms</name>
<value>30000</value>
</property>
<property>
<name>xasecure.audit.hdfs.config.encoding</name>
<value></value>
</property>
<!-- hdfs audit provider config-->
<property>
<name>xasecure.audit.hdfs.config.destination.directory</name>
<value>hdfs://NAMENODE_HOST:8020/ranger/audit/</value>
</property>
<property>
<name>xasecure.audit.hdfs.config.destination.file</name>
<value>%hostname%-audit.log</value>
</property>
<proeprty>
<name>xasecure.audit.hdfs.config.destination.flush.interval.seconds</name>
<value>900</value>
</proeprty>
<property>
<name>xasecure.audit.hdfs.config.destination.rollover.interval.seconds</name>
<value>86400</value>
</property>
<property>
<name>xasecure.audit.hdfs.config.destination.open.retry.interval.seconds</name>
<value>60</value>
</property>
<property>
<name>xasecure.audit.hdfs.config.local.buffer.directory</name>
<value>/var/log/hadoop/%app-type%/audit</value>
</property>
<property>
<name>xasecure.audit.hdfs.config.local.buffer.file</name>
<value>%time:yyyyMMdd-HHmm.ss%.log</value>
</property>
<property>
<name>xasecure.audit.hdfs.config.local.buffer.file.buffer.size.bytes</name>
<value>8192</value>
</property>
<property>
<name>xasecure.audit.hdfs.config.local.buffer.flush.interval.seconds</name>
<value>60</value>
</property>
<property>
<name>xasecure.audit.hdfs.config.local.buffer.rollover.interval.seconds</name>
<value>600</value>
</property>
<property>
<name>xasecure.audit.hdfs.config.local.archive.directory</name>
<value>/var/log/hadoop/%app-type%/audit/archive</value>
</property>
<property>
<name>xasecure.audit.hdfs.config.local.archive.max.file.count</name>
<value>10</value>
</property>
<!-- log4j audit provider config -->
<property>
<name>xasecure.audit.log4j.is.enabled</name>
<value>false</value>
</property>
<property>
<name>xasecure.audit.log4j.is.async</name>
<value>false</value>
</property>
<property>
<name>xasecure.audit.log4j.async.max.queue.size</name>
<value>10240</value>
</property>
<property>
<name>xasecure.audit.log4j.async.max.flush.interval.ms</name>
<value>30000</value>
</property>
<!-- kafka audit provider config -->
<property>
<name>xasecure.audit.kafka.is.enabled</name>
<value>false</value>
</property>
<property>
<name>xasecure.audit.kafka.async.max.queue.size</name>
<value>1</value>
</property>
<property>
<name>xasecure.audit.kafka.async.max.flush.interval.ms</name>
<value>1000</value>
</property>
<property>
<name>xasecure.audit.kafka.broker_list</name>
<value>localhost:9092</value>
</property>
<property>
<name>xasecure.audit.kafka.topic_name</name>
<value>ranger_audits</value>
</property>
<!-- ranger audit solr config -->
<property>
<name>xasecure.audit.solr.is.enabled</name>
<value>false</value>
</property>
<property>
<name>xasecure.audit.solr.async.max.queue.size</name>
<value>1</value>
</property>
<property>
<name>xasecure.audit.solr.async.max.flush.interval.ms</name>
<value>1000</value>
</property>
<property>
<name>xasecure.audit.solr.solr_url</name>
<value>http://localhost:6083/solr/ranger_audits</value>
</property>
</configuration>
ranger-policymgr-ssl.xml
<configuration>
<property>
<name>xasecure.policymgr.clientssl.keystore</name>
<value>hadoopdev-clientcert.jks</value>
</property>
<property>
<name>xasecure.policymgr.clientssl.truststore</name>
<value>cacerts-xasecure.jks</value>
</property>
<property>
<name>xasecure.policymgr.clientssl.keystore.credential.file</name>
<value>jceks://file/tmp/keystore-hadoopdev-ssl.jceks</value>
</property>
<property>
<name>xasecure.policymgr.clientssl.truststore.credential.file</name>
<value>jceks://file/tmp/truststore-hadoopdev-ssl.jceks</value>
</property>
</configuration>
Add the following configurations to goosefs-site.properties
:
...
goosefs.security.authorization.permission.type=CUSTOM
goosefs.security.authorization.custom.provider.class=org.apache.ranger.authorization.goosefs.RangerGooseFSAuthorizer
...
In \${GOOSEFS_HOME}/libexec/goosefs-config.sh
, add goosefs-ranger-plugin-${version}.jar
to the GooseFS class paths:
...
GOOSEFS_RANGER_CLASSPATH="${GOOSEFS_HOME}/lib/ranger-goosefs-plugin-${version}.jar"
GOOSEFS_SERVER_CLASSPATH=${GOOSEFS_SERVER_CLASSPATH}:${GOOSEFS_RANGER_CLASSPATH}
...
After these, the configuration is complete.
You can add a policy that allows Hadoop users to read and execute but not write to the GooseFS root directory as follows:
Apakah halaman ini membantu?