Hue Developer Guide

Last updated: 2019-11-13 16:05:07

PDF

Hue Overview

Hue is an open-source Apache Hadoop UI system that evolved from Cloudera Desktop. Cloudera eventually gifted it to the Hadoop project of Apache Software Foundation. Hue is implemented on the basis of Django, a Python web framework.

By using Hue, you can interact with Hadoop clusters in the web-based console on a browser, such as manipulating HDFS data, running MapReduce jobs, executing Hive SQL statements, and browsing HBase databases.

Hue Features

  • Hive SQL query.
  • Hbase data query, modification, and display.
  • HDFS access and file browsing.
  • Development, monitoring, coordinating, and scheduling of Oozie tasks.

Log in to the Hue Console

To use the Hue component to manage workflows, log in to the Hue Console first as shown below:

  1. Log in to the EMR Console and select Component Management in the left sidebar.
  2. Find the Hue component on the list page and click "Native WebUI Access Address" to enter the Hue page.
  3. When logging in to the Hue Console for the first time, use the root account and the password set when you created the cluster.

    As the default component account upon startup in EMR is Hadoop, please create a Hadoop account after logging in to the Hue Console with the root account for the first time. All subsequent jobs should be submitted using the Hadoop account.

Hive SQL Query

Hue's Beeswax app provides user-friendly and convenient Hive query capabilities, enabling you to select different Hive databases, write HQL statements, submit query tasks, and view results with ease.

  1. At the top of the Hue Console, select Query > Editor > Hive.
  2. Enter the statement to be executed in the statement input box and click the "Execute" icon to execute it.

HBase Data Query

You can use HBase Browser to query, modify, and display data from tables in an HBase cluster.

HDFS File Browsing

Hue's web UI makes it easy to view files and folders in HDFS and perform operations such as creation, download, upload, copy, modification, and deletion.

  1. In the left sidebar in the Hue Console, select Browsers > Files to browse HDFS files.
  2. After entering the File Browser, you can perform the operations as shown below:

Oozie Task Scheduling

  1. Prepare workflow data
    Hue's task scheduling is based on workflows. First, create a workflow containing a Hive script with the following content:
    | create database if not exists hive_sample; | 
    | show databases;| 
    | use hive_sample;|
    | show tables;|
    | create table if not exists hive_sample (a int, b string);|
    | show tables;|
    | insert into hive_sample select 1, "a";|
    | select * from hive_sample;|
    Save the content above as a file named hive_sample.sql. The Hive workflow also requires a hive-site.xml configuration file, which can be found on the cluster node where the Hive component is installed.
    The specific path is /usr/local/service/hive/conf/hive-site.xml. Copy the hive-site.xml file, and change the corresponding configuration items to the following values:
    <property>``  <name>hive.exec.local.scratchdir</name>``  <value>/tmp/hive</value>``</property>``<property>``  <name>hive.downloaded.resources.dir</name>``  <value>/tmp/hive/${hive.session.id}_resources</value>``</property>``<property>``  <name>hive.querylog.location</name>``  <value>/tmp/hive</value>``</property>``<property>``  <name>hive.server2.logging.operation.log.location</name>``  <value>/tmp/hive/tmp/operation_logs</value>``</property>
    Upload the Hive script file and hive-site.xml to a directory in HDFS, such as /user/hadoop.
  2. Create a workflow
    1. At the top of the Hue Console, select Query > Scheduler > Workflow.
    2. Drag a Hive script into the workflow editing page.
    3. Select the Hive script and hive-site.xml files you just uploaded.
    4. After clicking Add, you also need to specify the Hive script file in "FILES".
    5. Click "Save" in the top-right corner and then click the "Execute" icon to run the workflow.
  3. Create a scheduled task
    The scheduled task in Hive is "schedule" which is similar to the crontab in Linux. The supported scheduling granularity can be down to the minute level.
    1. Select Query > Scheduler > Schedule to create a schedule.
    2. Click Choose a workflow... to select a created workflow.
    3. Select the execution time, frequency, time zone, start time, and end time of the schedule and click Save to save.
  4. Execute the schedule
    1. Click the "Submit" icon in the top-right corner to submit the schedule.
    2. You can view the scheduling status on the monitoring page of the schedulers.