Apache Zeppelin is an interactive development system that enables big data visualization and analytics. Specifically, it can undertake various tasks such as data ingestion, discovery, analytics, visualization, and collaboration. It provides a rich set of graph visualization libraries such as SparkSQL on the frontend and supports big data systems like HBase and Flink in the form of plugin extension on the backend. In addition, it allows you to perform data preprocessing, algorithm development and debugging, and algorithm job scheduling for machine learning.
wordcount
with Sparkwordcount
program and run the following command:val data = sc.textFile("cosn://huanan/zeppelin-spark-randomint-test")
case class WordCount(word: String, count: Integer)
val result = data.flatMap(x => x.split(" ")).map(x => (x, 1)).reduceByKey(_ + _).map(x => WordCount(x._1, x._2))
result.toDF().registerTempTable("result")
%sql select * from result
Was this page helpful?