Zeppelin Overview

Last updated: 2020-04-28 17:04:38

    Apache Zeppelin is an interactive development system that enables big data visualization and analytics. Specifically, it can undertake various tasks such as data ingestion, discovery, analytics, visualization, and collaboration. It provides a rich set of graph visualization libraries such as SparkSQL on the frontend and supports big data systems like HBase and Flink in the form of plugin extension on the backend. In addition, it allows you to perform data preprocessing, algorithm development and debugging, and algorithm job scheduling for machine learning.

    Performing wordcount with Spark

    1. Click Create new note on the left and create a notebook on the pop-up page.
    2. Configure Spark for integration with an EMR cluster (Spark on YARN), modify parameters as needed, and save the configuration.
    3. Enter your own notebook.
    4. Write a wordcount program and run the following command:
      val data = sc.textFile("cosn://huanan/zeppelin-spark-randomint-test")
      case class WordCount(word: String, count: Integer)
      val result = data.flatMap(x => x.split(" ")).map(x => (x, 1)).reduceByKey(_ + _).map(x => WordCount(x._1, x._2))
      %sql select * from result

    Was this page helpful?

    Was this page helpful?

    • Not at all
    • Not very helpful
    • Somewhat helpful
    • Very helpful
    • Extremely helpful
    Send Feedback