tencent cloud

Feedback

Oozie Development Guide

Last updated: 2022-05-16 12:52:25

    Apache Oozie is an open-source workflow engine. It is designed to orchestrate the tasks of Hadoop ecosystem components into workflows and then schedule, execute, and monitor them. This document briefly describes how to use Oozie in EMR. For detailed directions, visit the website. Here, we recommend you use Oozie through Hue's GUI as instructed in the Hue development documentation.

    Prerequisites

    You have created an EMR Hadoop cluster and selected the Oozie service. For more information, see Creating EMR Cluster.

    Accessing Oozie WebUI

    • If you have enabled public network access for cluster nodes during cluster purchase, you can click the WebUI link in the EMR console for access.
    • If you are in the Chinese mainland, we recommend you set the WebUI time zone to GMT+08:00.

    Updating ShareLib

    As the EMR cluster is preinstalled with ShareLib, you no longer need to install it when using Oozie to submit a workflow job. Of course, you can edit and update ShareLib as instructed below:

    cd /usr/local/service/oozie
    Add `tar -xf oozie-sharelib.tar.gz` to `bin/oozie-setup.sh sharelib create -fs hdfs://active-namenode-ip:4007 -locallib shareoozie admin --oozie http://oozie-server-ip:12000/oozie -sharelibupdate` in the directory of the action to be supported in the `share` directory generated by decompressing the JAR package.
    

    Submitting Workflow in Non-Kerberos Environment

    Decompress the oozie-examples.tar.gz file in the Oozie installation directory /usr/local/service/oozie, which provides the sample workflows of the components supported by Oozie:

    tar -xf oozie-examples.tar.gz
    

    Take action hive2 as an example:

    • su hadoop.
    • cd examples/apps/hive2/.
    • Modify job.properties:
      • Set the value of namenode to the value of fs.defaultFS in core-site.xml.
      • Set the value of resourceManager to the value of yarn.resourcemanager.ha.rm-ids in yarn-site.xml in HA mode, or to the value of yarn.resourcemanager.address in non-HA mode.
      • The value of jdbcURL is jdbc:hive2://hive2-server:7001/default.
    • hadoop fs -put examples.
    • oozie job -debug -oozie http://oozie-server-ip:12000/oozie -config examples/apps/hive2/job.properties -run.
    • oozie job -info the job ID returned in the previous step (or viewed on the WebUI).

    Submitting Workflow in Kerberos Environment

    Take action hive2 as an example again. Check the README file in the hive2 directory for other notes.

    • kinit -kt /var/krb5kdc/emr.keytab hadoop's principal && su hadoop.
    • cd examples/apps/hive2/.
    • mv job.properties.security job.properties && mv workflow.xml.security workflow.xml.
    • Modify job.properties:
      • Set the value of namenode to the value of fs.defaultFS in core-site.xml.
      • Set the value of resourceManager to the value of yarn.resourcemanager.ha.rm-ids in yarn-site.xml in HA mode, or to the value of yarn.resourcemanager.address in non-HA mode.
      • The value of jdbcURL is jdbc:hive2://hive2-server:7001/default.
      • The value of jdbcPrincipal is the value of hive.server2.authentication.kerberos.principal.
    • hadoop fs -put examples.
    • oozie job -debug -oozie http://oozie-server-ip:12000/oozie -config examples/apps/hive2/job.properties -run.
    • oozie job -info the job ID returned in the previous step (or viewed on the WebUI).
    Contact Us

    Contact our sales team or business advisors to help your business.

    Technical Support

    Open a ticket if you're looking for further assistance. Our Ticket is 7x24 avaliable.

    7x24 Phone Support