Hue Development Guide

Last updated: 2020-06-15 15:52:17

    Hue Overview

    Hue is an open-source Apache Hadoop UI system that evolved from Cloudera Desktop. Cloudera eventually gifted it to the Hadoop project of Apache Software Foundation. Hue is implemented on the basis of Django, a Python web framework.

    By using Hue, you can interact with Hadoop clusters in the web-based console on a browser, such as manipulating HDFS data, running MapReduce jobs, executing Hive SQL statements, and browsing HBase databases.

    Hue Features

    • Hive SQL query.
    • Hbase data query, modification, and display.
    • HDFS access and file browsing.
    • Development, monitoring, coordinating, and scheduling of Oozie tasks.

    Log in to the Hue Console

    To use the Hue component to manage workflows, log in to the Hue Console first as shown below:

    1. Log in to the EMR Console Click the ID/name of the target cluster to go to the cluster details page, and then click Cluster Services.
    2. Find the Hue component on the list page and click WebUI Access Address to enter the Hue page.
    3. When logging in to the Hue Console for the first time, use the root account and the password set when you created the cluster.

      As the default component account upon startup in EMR is Hadoop, please create a Hadoop account after logging in to the Hue Console with the root account for the first time. All subsequent jobs should be submitted by using the Hadoop account.

    Hive SQL Query

    Hue's Beeswax app provides user-friendly and convenient Hive query capabilities, enabling you to select different Hive databases, write HQL statements, submit query tasks, and view results with ease.

    1. At the top of the Hue Console, select Query > Editor > Hive.
    2. Enter the statement to be executed in the statement input box and click Execute to execute it.

    HBase Data Query

    You can use HBase Browser to query, modify, and display data from tables in an HBase cluster.

    HDFS File Browsing

    Hue's web UI makes it easy to view files and folders in HDFS and perform operations such as creation, download, upload, copy, modification, and deletion.

    1. In the left sidebar in the Hue Console, select Browsers > Files to browse HDFS files.
    2. After entering the File Browser, you can perform the operations as shown below:

    Oozie Task Scheduling

    1. Prepare workflow data
      Hue's task scheduling is based on workflows. First, create a workflow containing a Hive script with the following content:
      | create database if not exists hive_sample; | 
      | show databases;| 
      | use hive_sample;|
      | show tables;|
      | create table if not exists hive_sample (a int, b string);|
      | show tables;|
      | insert into hive_sample select 1, "a";|
      | select * from hive_sample;|
      Save the content above as a file named hive_sample.sql. The Hive workflow also requires a hive-site.xml configuration file, which can be found on the cluster node where the Hive component is installed. Upload the Hive script file and hive-site.xml to a directory in HDFS, such as /user/hadoop.
      Upload the Hive script file and hive-site.xml to a directory in HDFS, such as /user/hadoop.
    2. Create a workflow
      • Switch to the hadoop user. At the top of the Hue Console, select Query > Scheduler > Workflow.
      • Drag a Hive script into the workflow editing page.

        Note:

        The document takes the installation of Hive 1 as an example, where the configuration parameter should be HiveServer1. Errors will be reported if other Hive versions are deployed at the same time (or if the configuration parameters of other Hive versions are used).

      • Select the Hive script and hive-site.xml files you just uploaded.
      • After clicking Add, you also need to specify the Hive script file in "FILES".
      • Click Save in the top-right corner and then click Execute to run the workflow.
    3. Create a scheduled task
      The scheduled task in Hive is "schedule" which is similar to the crontab in Linux. The supported scheduling granularity can be down to the minute level.
      • Select Query > Scheduler > Schedule to create a schedule.
      • Click Choose a workflow... to select a created workflow.
      • Select the execution time, frequency, time zone, start time, and end time of the schedule and click Save to save.
    4. Execute the schedule
      • Click Submit in the top-right corner to submit the schedule.
      • You can view the scheduling status on the monitoring page of the schedulers.

    Was this page helpful?

    Was this page helpful?

    • Not at all
    • Not very helpful
    • Somewhat helpful
    • Very helpful
    • Extremely helpful
    Send Feedback
    Help