Apache Flume is a highly available, distributed, and configurable tool/service that collects data resources such as logs and events and aggregates large amounts of such data from different sources. It is designed to collect data flows (e.g., log data) from various web servers and store them in centralized data stores like HDFS and HBase.
A Flume event is defined as a unit of data flow. A Flume agent is a JVM process that contains the components required for completing a task. Among them, the three core ones are Source, Channel, and Sink.
/usr/local/service/flume
path on the core and task nodes of the EMR cluster. # example.conf: A single-node Flume configuration
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444
# Describe the sink
a1.sinks.k1.type = logger
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
bin/flume-ng agent --conf conf --conf-file example.conf --name a1 -Dflume.root.logger=INFO,console
After successful configuration, you will see the Flume agent started previously printing to the terminal.
telnet localhost 44444
Trying 127.0.0.1...
Connected to localhost.localdomain (127.0.0.1).
Escape character is '^]'.
Hello world! <ENTER>
OK
Was this page helpful?