tencent cloud

Feedback

Zeppelin Overview

Last updated: 2022-05-16 12:52:26

    Apache Zeppelin is a web-based notebook that enables interactive data analysis. It allows you to create interactive collaborative documents with various prebuilt language backends (or interpreters), such as Scala (Apache Spark), Python (Apache Spark), Spark SQL, Hive, and Shell.

    Prerequisites

    • You have created a cluster and selected the Zeppelin service. For more information, see Creating EMR Cluster.
    • In the cluster's EMR security group, open ports 22 and 30001 (which are opened by default for new clusters) and the necessary IP ranges for communication over the private network. The new security group is named in the format of emr-xxxxxxxx_yyyyMMdd, which shouldn't be modified manually.
    • Add services as needed, such as Spark, Flink, HBase, and Kylin.

    Logging in to Zeppelin

    1. Create a cluster and select the Zeppelin service. For more information, see Creating EMR Cluster.
    2. On the left sidebar in the EMR console, select Cluster Service.
    3. Click the Zeppelin block and click Web UI Address to access the web UI.
    4. For EMR versions later than 3.1.0, the default login permission is set, and both the username and password are admin. To change the password, modify the users and roles options in the /usr/local/service/zeppelin-0.8.2/conf/shiro.ini configuration file. For more configuration instructions, see Apache Shiro Configuration.
    5. In EMR 2.6.0 and 3.3.0, Zeppelin login is integrated into the OpenLDAP account, so you can log in only with an OpenLDAP account and password. After a cluster is created, the default OpenLDAP accounts are root and hadoop, and the default password is the cluster password. Only the root account has the Zeppelin admin permissions and has access to the interpreter configuration page.

    Performing wordcount with Spark

    1. Click Create new note on the left of the page to create a notebook in the pop-up window.
    2. Configure Spark for integration with an EMR cluster (Spark on YARN). Modify and save the configuration.
    3. Go to your own notebook.
    4. Write a wordcount program and run the following commands:
      val data = sc.textFile("cosn://huanan/zeppelin-spark-randomint-test")
      case class WordCount(word: String, count: Integer)
      val result = data.flatMap(x => x.split(" ")).map(x => (x, 1)).reduceByKey(_ + _).map(x => WordCount(x._1, x._2))
      result.toDF().registerTempTable("result")
      %sql select * from result
      

    Contact Us

    Contact our sales team or business advisors to help your business.

    Technical Support

    Open a ticket if you're looking for further assistance. Our Ticket is 7x24 avaliable.

    7x24 Phone Support