Monitoring Metrics

Last updated: 2020-05-21 15:38:46

    Server Monitoring Metrics

    Server - CPU

    Metric Name Unit Description
    idle % Percentage of CPU idle time
    irq % Percentage of interrupts
    nice % Percentage of CPU utilization under nice priority
    steal % Percentage of wait time by virtual CPUs for physical CPUs
    softirq % Percentage of CPU soft interrupts
    guest % Percentage of time spent running virtual processors
    system % CPU utilization in kernel state
    user % CPU utilization in user state
    iowait % Percentage of CPU idleness due to process I/O waits
    1m Load/s 1-minute load
    5m Load/s 5-minute load
    15m Load/s 15-minute load
    cpu_count - Number of CPU cores

    CPU - memory

    Metric Name Unit Description
    MemTotal GB Total memory size
    MemFree GB Total free memory size
    MemAvailable GB Total available memory size
    Buffers GB Total memory size used by buffers
    Cached GB Total memory size used by file cache
    SwapCached GB Total swap memory size by anonymous page writes
    SwapFree GB Total available swap size
    AnonPages GB Total unmapped memory size
    SwapTotal GB Total swap size
    Dirty GB Total memory size to write to disk
    Writeback GB Total memory size being written back to disk
    HardwareCorrupted GB Total unavailable memory size due to memory hardware failure
    Shmem GB Total shared memory size
    available_percent % Percentage of available memory size out of total memory
    used_percent % Percentage of used memory size out of total memory

    Server - network

    Metric Name Unit Description
    ListenDrops Connections/s Number of incoming connections (SYN packets) dropped for any reason
    ListenOverflows Occurrences/s Number of occurrences where the upper limit of the Accept queue is exceeded after the last step of three-way handshake is completed
    SyncookiesFailed Packets/s Number of packets received with invalid SYN Cookie information
    SyncookiesRecv Packets/s Number of packets received with valid SYN Cookie information
    SyncookiesSent Packets/s Number of SYN/ACK packets sent through SYN Cookie
    TCPAbortOnTimeout Connections/s Number of connection closed because the attempts of retransmissions of various timers (RTO/PTO/keepalive) exceed the upper limit
    TCPAbortOnData Sockets/s Number of sockets closed due to unknown data received
    TCPAbortOnClose Sockets/s Number of sockets closed when the user-mode program has data in the buffer
    TCPAbortOnMemory Connections/s Number of connections closed due to memory issues
    TCPAbortOnLinger Connections/s Number of connections suspended in lingering state after being closed
    TCPAbortFailed Attempts/s Number of failed attempts to close connection
    ActiveOpens Connections/s Number of actively established TCP connections
    CurrEstab Connections/s Number of TCP connections currently established
    ActiveOpens Connections/s Number of passively established TCP connections
    AttemptFails Failures/s Number of connection establishment failures
    EstabResets Connections/s Number of reset connections
    InSegs Packets/s Number of received packets, including erroneous ones
    OutSegs Packet/s Number of sent data packets
    RetransSegs Packets/s Number of received TCP packets
    InErrs Packet/s Number of retransmitted packets
    OutRsts Packet/s Number of sent RST packets
    RetransSegsRate % Retransmission rate at TCP layer
    ResetRate % RESET sending frequency
    InErrRate % Percentage of erroneous packets
    TW Sockets/s Number of sockets ending TIME_WAIT state after normal timeout
    TWKilled Sockets/s Number of sockets ending TIME_WAIT state through tcp_tw_recycle mechanism
    TCPTimeWaitOverflow Sockets/s Number of TIME_WAIT sockets unable to be allocated due to limit exceeding
    TWRecycled Sockets/s Number of sockets ending TIME_WAIT state through tcp_tw_reuse mechanism
    TCPTimeouts Timeouts/s Number of first RTO timer timeouts
    TCPSpuriousRTOs Timeouts/s Number of spurious timeouts detected through F-RTO mechanism
    TCPLossProbes Packets/s Number of Tail Loss Probe (TLP) packets sent due to Probe Timeout (PTO)
    TCPLossProbeRecovery Packets/s Number of lost packets just repaired by TLP probes
    TCPRenoRecoveryFail Connections/s Number of connections that enter the Recovery phase and then undergo RTO (SACK option not supported by the opposite)
    TCPSackRecoveryFail Connections/s Number of connections that enter the Recovery phase and then undergo RTO (SACK option supported by the opposite)
    TCPRenoFailures Failures/s Number of failures that enter the TCP_CA_Disorder phase and then undergo RTO (SACK option not supported by the opposite)
    TCPSackFailures Failures/s Number of failures that enter the TCP_CA_Disorder phase and then undergo RTO (SACK option supported by the opposite)
    TCPLossFailures Connections/s Number of connections that enter the TCP_CA_Loss phase and then undergo RTO timeout
    RtoAlgorithm Algorithms/s Number of delayed algorithms for forwarding unanswered objects
    RtoMax Retransmissions/s Maximum number of retransmissions due to TCP delay
    RtoMin Retransmissions/s Minimum number of retransmissions due to TCP delay
    TCPLostRetransmit Retransmissions/s Number of SKB retransmissions due to loss
    TCPFastRetrans Retransmissions/s Number of fast SKB retransmissions
    TCPForwardRetrans Retransmissions/s Number of regular SKB transmissions
    TCPSlowStartRetrans Retransmissions/s Number of SKB retransmissions with successful slow start
    TCPRetransFail Failures/s Number of failed retransmission attempts
    OutDatagrams Datagrams/s Number of sent UDP datagrams
    InDatagrams Datagrams/s Number of received UDP datagrams
    eth0-receive_bytes MB/s Volume of data received by ENI
    eth0-transmit_bytes MB/s Volume of data sent by ENI
    eth0-receive_drop Packets/s Volume of data received and then dropped by ENI
    eth0-receive_errs Packets/s Volume of data failed to be received by ENI
    eth0-transmit_drop Packets/s Volume of data sent and then dropped by ENI
    eth0-transmit_errs Packets/s Volume of data failed to be sent by ENI
    eth0-transmit_packetsl Packet/s Number of packets sent by ENI
    TCP_inuse - Number of TCP sockets in use (listening)
    TCP_orphan - Number of TCP connections waiting to be closed
    TCP_tw - Number of TCP sockets to be destroyed
    TCP_alloc - Number of TCP sockets allocated (established, sk_buff obtained)
    ESTABLISHED - Number of TCP connections in Established state
    SYN-SENT - Number of TCP connections in SYN-SENT state
    SYN-RECV - Number of TCP connections in SYN-RECV state
    FIN-WAIT1 - Number of TCP connections in FIN-WAIT1 state
    FIN-WAIT2 - Number of TCP connections in FIN-WAIT2 state
    TIME-WAIT - Number of TCP connections in TIME-WAIT state
    CLOSE - Number of TCP connections in CLOSE state
    CLOSE-WAIT - Number of TCP connections in CLOSE-WAIT state
    LAST-ACK - Number of TCP connections in LAST-ACK state
    LISTEN - Number of TCP connections in LISTEN state
    CLOSEING - Number of TCP connections in CLOSEING state

    Server - disk

    Metric Name Unit Description
    Read MB/s Data read per second
    Write MB/s Data written per second
    vd- - Number of I/O operations in progress on current device
    Read ms Average wait time per device I/O read operation
    Write ms Average wait time per device I/O write operation
    IO ms Average processing time per I/O request
    Read Queries/s Read QPS
    Write Queries/s Write QPS
    Merge-Read Queries/s Merged read QPS
    Merge-Write Queries/s Merged write QPS
    vd- % Disk busyness
    Free GB Free disk capacity
    Used GB Used disk capacity
    Total GB Total disk capacity
    Used % Disk utilization
    Free - Number of remaining disk inodes
    Total - Total number of disk inodes
    Used % Disk inode utilization

    Server - file handle

    Metric Name Unit Description
    allocated - Number of allocated file handles
    maximum - Maximum number of file handles

    Server - process

    Metric Name Unit Description
    intr_total Interrupts/s Number of system interrupts
    context_switches_total Switches/s Number of system context switches
    forks_total Processes/s Number of new system processes
    procs_running Processes/s Number of running system processes
    procs_blocked Processes/s Number of blocked system processes
    procs_total Processes/s Total number of system processes
    AgentVersionl version Agent version

    HDFS Monitoring Metrics

    HDFS - overview

    Metric Name Unit Description
    CapacityTotal GB Total cluster storage capacity
    CapacityUsed GB Used cluster storage capacity
    CapacityRemaining GB Remaining cluster storage capacity
    CapacityUsedNonDFS GB Non-HDFS used cluster capacity
    TotalLoad - Number of current connections
    FilesTotal - Total number of files
    BlocksTotal - Total number of blocks
    PendingReplicationBlocks - Number of blocks waiting to be backed up
    UnderReplicatedBlocks - Number of blocks with insufficient replicates
    CorruptBlocks - Number of corrupted blocks
    ScheduledReplicationBlocks - Number of blocks arranged for backup
    PendingDeletionBlocks - Number of blocks waiting to be deleted
    ExcessBlocks - Number of excessive blocks
    PostponedMisreplicatedBlocks - Number of exceptional blocks postponed to be processed
    BlockCapacity - Capacity of blocks
    NumLiveDataNodes - Number of live data nodes
    NumDeadDataNodes - Number of data nodes marked as dead
    NumDecomLiveDataNodes - Number of deactivate live nodes
    NumDecomDeadDataNodes - Number of deactivated dead nodes
    NumDecommissioningDataNodes - Number of deactivating nodes
    NumStaleDataNodes - Number of current DataNodes marked as expired due to heartbeat delay
    Snapshots - Number of snapshots
    VolumeFailuresTotal - Total number of failures on all DataNodes

    HDFS - NameNode

    Metric Name Unit Description
    ReceivedBytes Bytes/s Data receiving rate
    SentBytes Bytes/s Data sending rate
    RpcQueueTimeNumOps Calls/s RPC call rate
    RpcQueueTimeAvgTime ms Average RPC delay
    RpcAuthenticationFailures - Number of RPC authentication failures
    RpcAuthenticationSuccesses - Number of RPC authentication successes
    RpcAuthorizationFailures - Number of RPC authorization failures
    RpcAuthorizationSuccesses - Number of RPC authorization successes
    NumOpenConnections - Number of current connections
    CallQueueLength - Length of current RPC processing queue
    MemNonHeapUsedM MB Size of NonHeapMemory currently used by JVM
    MemNonHeapCommittedM MB JVM memory
    MemHeapUsedM MB Size of HeapMemory currently used by JVM
    MemHeapCommittedM MB Submitted size of JVM HeapMemory
    MemHeapMaxM MB Size of HeapMemory configured by JVM
    MemMaxM MB Maximum size of memory available to JVM runtime
    BlockReportAvgTime Blocks/s Average delay for processing DataNode blocks per second
    FGC Operations/s Full GC count
    YGC 2/s Young GC count
    YGCT ms Young GC time
    FGCT ms Full GC time
    GCT ms Garbage collection time
    ThreadsNew - Number of threads in new state
    ThreadsRunnable - Number of threads in runnable state
    ThreadsBlocked - Number of threads in blocked state
    ThreadsWaiting - Number of threads in WAITING state
    ThreadsTimedWaiting - Number of threads in TIMED WAITING state
    ThreadsTerminated - Number of threads in Terminated state
    LogFatal - Number of Fatal logs
    LogError - Number of Error logs
    LogWarn - Number of Warn logs
    LogInfo - Number of Info logs
    S0 % Percentage of used Survivor 0 memory
    S1 % Percentage of used Survivor 1 memory
    E % Percentage of used Eden memory
    O % Percentage of used Old memory
    M % Percentage of used Metaspace memory
    CCS % Percentage of memory used by compressed class space
    NumStaleStorages - Number of current DataNodes marked as expired due to heartbeat delay
    PendingDataNodeMessageCount Requests/s Number of DataNode requests queued on standby NameNode
    NumberOfMissingBlocks - Number of missing blocks
    NumberOfMissingBlocksWithReplicationFactorOne - Number of missing blocks (rf = 1)
    AllowSnapshotOps Operations/s Number of AllowSnapshot operations executed per second
    DisallowSnapshotOps Operations/s Number of DisallowSnapshot operations executed per second
    CreateSnapshotOps Operations/s Number of CreateSnapshot operations executed per second
    DeleteSnapshotOps Operations/s Number of DeleteSnapshot operations executed per second
    ListSnapshottableDirOps Operations/s Number of ListSnapshottableDir operations executed per second
    SnapshotDiffReportOps Operations/s Number of SnapshotDiffReportOps operations executed per second
    RenameSnapshotOps Operations/s Number of RenameSnapshotOps operations executed per second
    CreateFileOps Operations/s Number of CreateFile operations executed per second
    GetListingOps Operations/s Number of GetListing operations executed per second
    TotalFileOps Operations/s Number of TotalFileOps operations executed per second
    DeleteFileOps Operations/s Number of DeleteFile operations executed per second
    FileInfoOps Operations/s Number of FileInfo operations executed per second
    GetAdditionalDatanodeOps Operations/s Number of GetAdditionalDatanode operations executed per second
    CreateSymlinkOps Operations/s Number of CreateSymlink operations executed per second
    GetLinkTargetOps Operations/s Number of GetLinkTarget operations executed per second
    FilesInGetListingOps Operations/s Number of FilesInGetListing operations executed per second
    TransactionsNumOps Operations/s Number of Journal transaction operations processed per second
    TransactionsBatchedInSync Operations/s Number of Journal transaction operations processed in batches per second
    GetEditNumOps Operations/s Number of GetEditNumOps operations executed per second
    GetImageNumOps Operations/s Number of GetImageNumOps operations executed per second
    PutImageNumOps Operations/s Number of PutImageNumOps operations executed per second
    SyncsNumOps Operations/s Number of Journal syncs operations processed per second
    BlockReceivedAndDeletedOps Operations/s Number of BlockReceivedAndDeletedOps operations executed per second
    BlockOpsQueued Operations/s Delay in processing DataNode block reporting operations
    CacheReportNumOps Operations/s Number of CacheReport operations processed per second
    BlockReportNumQps Operations/s Number of DataNode block reporting operations processed per second
    SyncsAvgTime ms Average delay in processing Journal syncs operations
    CacheReportAvgTime ms Average time of cache reporting operation
    GetEditAvgTime ms Average delay in reading Edit file
    GetImageAvgTime ms Average delay reading image file
    PutImageAvgTime ms Average delay in writing image file
    TransactionsAvgTime ms Average delay in processing Journal transaction operations
    StartTime ms Process start time
    State - NN state
    PeakThreadCount - Peak number of threads
    ThreadCount - Number of threads
    DaemonThreadCount - Number of background threads

    HDFS - DataNode

    Metric Name Unit Description
    XceiverCount - Number of Xceivers
    BytesWrittenMB Bytes/s DN byte write rate
    BytesReadMB Bytes/s DN byte read rate
    RemoteBytesReadMB Bytes/s Rate of bytes read by remote client
    RemoteBytesWrittenMB Bytes/s Rate of bytes written by remote client
    WritesFromRemoteClient - QPS of write operations from remote client
    WritesFromLocalClient - OPS of write operations from local client
    ReadsFromRemoteClient - QPS of read operations from remote client
    ReadsFromLocalClient - QPS of read operations from local client
    BlockVerificationFailures Failures/s Number of block check failures
    VolumeFailures Failures/s Number of disk failures
    DatanodeNetworkErrors Errors/s Total number of network errors
    HeartbeatsAvgTime ms Average heartbeat API time
    HeartbeatsNumOps Queries/s Heartbeat API QPS
    SendDataPacketTransferNanosAvgTime ms Average data packet sending time
    ReadBlockOpNumOps Operations/s OPS of block reads from DataNode
    WriteBlockOpNumOps Operations/s OPS of block writes to DataNode
    BlockChecksumOpNumOps Operations/s OPS of Checksum operations by DataNode
    CopyBlockOpNumOps Operations/s OPS of block copying operations
    ReplaceBlockOpNumOps Operations/s OPS of Replace Block operations
    BlockReportsNumOps Operations/s OPS of block reporting operations
    IncrementalBlockReportsNumOps Reports/s OPS of incremental block reporting
    CacheReportsNumOps Reports/s OPS of cache reporting
    PacketAckRoundTripTimeNanosNumOps Operations/s Number of ACK ROUND TRIP operations processed per second
    FlushNanosNumOps Operations/s Number of Flush operations processed per second
    ReadBlockOpAvgTime ms Average block read time
    WriteBlockOpAvgTime ms Average block write operation time
    BlockChecksumOpAvgTime ms Average block check time
    CopyBlockOpAvgTime ms Average block copy time
    ReplaceBlockOpAvgTime ms Average Replace Block operation time
    BlockReportsAvgTime ms Average block reporting time
    IncrementalBlockReportsAvgTime ms Average time of incremental block reporting
    CacheReportsAvgTime ms Average time of cache reporting
    PacketAckRoundTripTimeNanosAvgTime ms Average time of processing ACK ROUND TRIP
    FlushNanosAvgTime ms Average Flush operation time
    FsyncNanosAvgTime ms Average Fsync operation time
    RamDiskBlocksWrite Blocks/s Total number of blocks written to memory
    RamDiskBlocksWriteFallback Blocks/s Total number of blocks failed to be written to memory (failover to disk)
    RamDiskBlocksDeletedBeforeLazyPersisted Blocks/s Total number of blocks deleted before application is saved to disk
    RamDiskBlocksReadHits Blocks/s Number of reads from blocks in memory
    RamDiskBlocksEvicted Blocks/s Total number of blocks cleared in memory
    RamDiskBlocksEvictedWithoutRead Blocks/s Total number of blocks retrieved from memory
    RamDiskBlocksLazyPersisted Blocks/s Number of disk writes by lazy writer
    RamDiskBytesLazyPersisted Bytes/s Total number of bytes written to disk by lazy writer
    RamDiskBytesWrite Bytes/s Total number of bytes written to memory
    MemNonHeapUsedM MB Size of NonHeapMemory currently used by JVM
    MemNonHeapCommittedM MB Size of NonHeapCommittedM configured by JVM
    MemHeapUsedM MB Size of HeapMemory currently used by JVM
    MemHeapCommittedM MB Submitted size of JVM HeapMemory
    MemHeapMaxM MB Size of HeapMemory configured by JVM
    MemMaxM MB Maximum size of memory available to JVM runtime
    ThreadsNew - Number of threads in new state
    ThreadsRunnable - Number of threads in runnable state
    ThreadsBlocked - Number of threads in blocked state
    ThreadsWaiting - Number of threads in WAITING state
    ThreadsTimedWaiting - Number of threads in TIMED WAITING state
    ThreadsTerminated - Number of threads in Terminated state
    LogFatal - Number of Fatal logs
    LogError - Number of Error logs
    LogWarn - Number of Warn logs
    LogInfo - Number of Info logs
    S0 % Percentage of used Survivor 0 memory
    S1 % Percentage of used Survivor 1 memory
    E % Percentage of used Eden memory
    O % Percentage of used Old memory
    M % Percentage of used Metaspace memory
    CCS % Percentage of memory used by compressed class space
    FGC - Full GC count
    YGC ms Young GC count
    YGCT s Young GC time
    FGCT s Full GC time
    GCT s Garbage collection time
    ReceivedBytes Bytes/s Data receiving rate
    SentBytes Bytes/s Data sending rate
    RpcQueueTimeNumOps Calls/s RPC call rate
    RpcQueueTimeAvgTime ms Average RPC delay
    RpcAuthenticationFailures Failures/s Number of RPC authentication failures
    RpcAuthenticationSuccesses Successes/s Number of RPC authentication successes
    RpcAuthorizationFailures Failures/s Number of RPC authorization failures
    RpcAuthorizationSuccesses Successes/s Number of RPC authorization successes
    NumOpenConnections - Number of current connections
    CallQueueLength - Length of current RPC processing queue
    CurrentThreadSystemTime ms System time
    CurrentThreadUserTime ms User time
    StartTime s Process start time
    PeckThreadCount - Peak number of threads
    DaemonThreadCount - Number of background threads
    write MB/s Disk write rate
    read Queries/s QPS of read operations
    FsyncNanosOps Operation/s Average number of Fsync operations
    DataPacketOps Operations/s QPS of packet transmission operations

    HDFS - Journal Node

    Metric Name Unit Description
    MemNonHeapUsedM MB Size of NonHeapMemory currently used by JVM
    MemNonHeapCommittedM MB JVM memory
    MemHeapUsedM MB Size of HeapMemory currently used by JVM
    MemHeapCommittedM MB Submitted size of JVM HeapMemory
    MemHeapMaxM MB Size of HeapMemory configured by JVM
    MemMaxM MB Maximum size of memory available to JVM runtime
    ThreadsNew - Number of threads in new state
    ThreadsRunnable - Number of threads in runnable state
    ThreadsBlocked - Number of threads in blocked state
    ThreadsWaiting - Number of threads in WAITING state
    ThreadsTimedWaiting - Number of threads in TIMED WAITING state
    ThreadsTerminated - Number of threads in Terminated state
    LogFatal - Number of Fatal logs
    LogError - Number of Error logs
    LogWarn - Number of Warn logs
    LogInfo - Number of Info logs
    S0 % Percentage of used Survivor 0 memory
    S1 % Percentage of used Survivor 1 memory
    E % Percentage of used Eden memory
    O % Percentage of used Old memory
    M % Percentage of used Metaspace memory
    CCS % Percentage of memory used by compressed class space
    FGC - Full GC count
    YGC - Young GC count
    YGCT s Young GC time
    FGCT s Full GC time
    GCT s Garbage collection time
    ReceivedBytes Bytes/s Data receiving rate
    SentBytes Bytes/s Data sending rate
    RpcQueueTimeNumOps Calls/s RPC call rate
    RpcQueueTimeAvgTime ms Average RPC delay
    RpcAuthenticationFailures Failures/s Number of RPC authentication failures
    RpcAuthenticationSuccesses Successes/s Number of RPC authentication successes
    RpcAuthorizationFailures Failures/s Number of RPC authorization failures
    NumOpenConnections - Number of current connections
    CallQueueLength - Length of current RPC processing queue
    CurrentThreadSystemTime ms System time
    CurrentThreadUserTime ms User time
    StartTime s Process start time
    ThreadCount - Number of threads
    PeckThreadCount - Peak number of threads
    DaemonThreadCount - Number of background threads

    HDFS - ZKFC

    Metric Name Unit Description
    S0 % Percentage of used Survivor 0 memory
    S1 % Percentage of used Survivor 1 memory
    E % Percentage of used Eden memory
    O % Percentage of used Old memory
    M % Percentage of used Metaspace memory
    CCS % Percentage of memory used by compressed class space
    FGC - Full GC count
    YGC - Young GC count
    YGCT s Young GC time
    FGCT s Full GC time
    GCT s Garbage collection time

    Yarn Monitoring Metrics

    YARN - overview

    Metric Name Unit Description
    NumActiveNMs - Number of nodes
    NumDecommissionedNMs - Number of nodes
    NumLostNMs - Number of nodes
    NumUnhealthyNMs - Number of nodes
    AllocatedVCores - Number of CPU cores
    ReservedVCores - Number of CPU cores
    AvailableVCores - Number of CPU cores
    PendingVCores - Number of CPU cores
    AppsSubmitted - Total number of applications
    AppsRunning - Total number of applications
    AppsPending - Total number of applications
    AppsCompleted - Total number of applications
    AppsKilled - Total number of applications
    AppsFailed - Total number of applications
    ActiveApplications - Total number of applications
    running_0 - Total number of applications
    running_60 - Total number of applications
    running_300 - Total number of applications
    running_1440 - Total number of applications
    AllocatedMB MB Memory size
    AvailableMB MB Memory size
    PendingMB MB Memory size
    ReservedMB MB Memory size
    AllocatedContainers - Total number of containers
    PendingContainers - Total number of containers
    ReservedContainers - Total number of containers
    AggregateContainersAllocated - Number of allocated containers
    AggregateContainersReleased - Number of released containers
    ActiveUsers - Number of users

    YARN - ResourceManager

    Metric Name Unit Description
    RpcAuthenticationFailures - Number of RPC authentication authorizations
    RpcAuthenticationSuccesses - Number of RPC authentication authorizations
    RpcAuthorizationFailures - Number of RPC authentication authorizations
    RpcAuthorizationSuccesses - Number of RPC authentication authorizations
    ReceivedBytes Bytes/s Volume of sent data received by RPC
    SentBytes Bytes/s Volume of sent data received by RPC
    NumOpenConnections - Number of RPC connections
    RpcProcessingTimeNumOps - Number of RPC requests
    RpcQueueTimeNumOps - Number of RPC requests
    CallQueueLength - RPC queue length
    RpcProcessingTimeAvgTime s Average RPC processing time
    RpcQueueTimeAvgTime s Average RPC processing time
    RpcAuthenticationFailures - Number of RPC authentication authorizations
    RpcAuthenticationSuccesses - Number of RPC authentication authorizations
    RpcAuthorizationFailures - Number of RPC authentication authorizations
    RpcAuthorizationSuccesses - Number of RPC authentication authorizations
    ReceivedBytes Bytes/s Volume of sent data received by RPC
    SentBytes Bytes/s Volume of sent data received by RPC
    NumOpenConnections - Number of RPC connections
    RpcProcessingTimeNumOps - Number of RPC requests
    RpcQueueTimeNumOps - Number of RPC requests
    CallQueueLength - RPC queue length
    RpcProcessingTimeAvgTime s Average RPC processing time
    RpcQueueTimeAvgTime s Average RPC processing time
    RpcAuthenticationFailures - Number of RPC authentication authorizations
    RpcAuthenticationSuccesses - Number of RPC authentication authorizations
    RpcAuthorizationFailures - Number of RPC authentication authorizations
    RpcAuthorizationSuccesses - Number of RPC authentication authorizations
    ReceivedBytes Bytes/s Volume of sent data received by RPC
    SentBytes Bytes/s Volume of sent data received by RPC
    NumOpenConnections - Number of RPC connections
    RpcProcessingTimeNumOps - Number of RPC requests
    RpcQueueTimeNumOps - Number of RPC requests
    CallQueueLength - RPC queue length
    RpcProcessingTimeAvgTime s Average RPC processing time
    RpcQueueTimeAvgTime s Average RPC processing time
    RpcAuthenticationFailures - Number of RPC authentication authorizations
    RpcAuthenticationSuccesses - Number of RPC authentication authorizations
    RpcAuthorizationFailures - Number of RPC authentication authorizations
    RpcAuthorizationSuccesses - Number of RPC authentication authorizations
    ReceivedBytes Bytes/s Volume of sent data received by RPC
    SentBytes Bytes/s Volume of sent data received by RPC
    NumOpenConnections - Number of RPC connections
    RpcProcessingTimeNumOps - Number of RPC requests
    RpcQueueTimeNumOps - Number of RPC requests
    CallQueueLength - RPC queue length
    RpcProcessingTimeAvgTime s Average RPC processing time
    RpcQueueTimeAvgTime s Average RPC processing time
    YGC - GC count
    FGC - GC count
    FGCT s GC time
    GCT s GC time
    YGCT s GC time
    S0 % Memory area proportion
    E % Memory area proportion
    CCS % Memory area proportion
    S1 % Memory area proportion
    O % Memory area proportion
    M % Memory area proportion
    ThreadsNew - Number of JVM threads
    ThreadsRunnable - Number of JVM threads
    ThreadsBlocked - Number of JVM threads
    ThreadsWaiting - Number of JVM threads
    ThreadsTimedWaiting - Number of JVM threads
    ThreadsTerminated - Number of JVM threads
    LogFatal - Number of JVM logs
    LogError - Number of JVM logs
    LogWarn - Number of JVM logs
    LogInfo - Number of JVM logs
    MemNonHeapUsedM MB JVM memory
    MemNonHeapCommittedM MB JVM memory
    MemNonHeapMaxM MB JVM memory
    MemHeapUsedM MB JVM memory
    MemHeapCommittedM MB JVM memory
    MemHeapMaxM MB JVM memory
    MemMaxM MB JVM memory
    ProcessCpuLoad % CPU utilization
    ProcessCpuTime ms Cumulative CPU usage time
    MaxFileDescriptorCount - Number of file descriptor
    OpenFileDescriptorCount - Number of file descriptor
    Uptime s Process run time
    DaemonThreadCount - Number of worker threads
    ThreadCount - Number of worker threads

    YARN - JobHistoryServer

    Metric Name Unit Description
    ThreadsNew - Number of JVM threads
    ThreadsRunnable - Number of JVM threads
    ThreadsBlocked - Number of JVM threads
    ThreadsWaiting - Number of JVM threads
    ThreadsTimedWaiting - Number of JVM threads
    ThreadsTerminated - Number of JVM threads
    LogFatal - Number of JVM logs
    LogError - Number of JVM logs
    LogWarn - Number of JVM logs
    LogInfo - Number of JVM logs
    MemNonHeapUsedM MB JVM memory
    MemNonHeapCommittedM MB JVM memory
    MemNonHeapMaxM MB JVM memory
    MemHeapUsedM MB JVM memory
    MemHeapCommittedM MB JVM memory
    MemHeapMaxM MB JVM memory
    MemMaxM MB JVM memory
    YGC - GC count
    FGC - GC count
    FGCT s GC time
    GCT s GC time
    YGCT s GC time
    S0 % Memory area proportion
    E % Memory area proportion
    CCS % Memory area proportion
    S1 % Memory area proportion
    O % Memory area proportion
    M % Memory area proportion
    ProcessCpuLoad % CPU utilization
    ProcessCpuTime ms Cumulative CPU usage time
    MaxFileDescriptorCount - Number of file descriptor
    OpenFileDescriptorCount - Number of file descriptor
    Uptime s Process run time
    DaemonThreadCount - Number of worker threads
    ThreadCount - Number of worker threads

    YARN - NodeManager

    Metric Name Unit Description
    YGC - GC count
    FGC - GC count
    FGCT s GC time
    GCT s GC time
    YGCT s GC time
    S0 % Memory area proportion
    E % Memory area proportion
    CCS % Memory area proportion
    S1 % Memory area proportion
    O % Memory area proportion
    M % Memory area proportion
    ThreadsNew - Number of JVM threads
    ThreadsRunnable - Number of JVM threads
    ThreadsBlocked - Number of JVM threads
    ThreadsWaiting - Number of JVM threads
    ThreadsTimedWaiting - Number of JVM threads
    ThreadsTerminated - Number of JVM threads
    LogFatal - Number of JVM logs
    LogError - Number of JVM logs
    LogWarn - Number of JVM logs
    LogInfo - Number of JVM logs
    MemNonHeapUsedM MB JVM memory
    MemNonHeapCommittedM MB JVM memory
    MemNonHeapMaxM MB JVM memory
    MemHeapUsedM MB JVM memory
    MemHeapCommittedM MB JVM memory
    MemHeapMaxM MB JVM memory
    MemMaxM MB JVM memory
    ContainersLaunched - Total number of containers
    ContainersCompleted - Total number of containers
    ContainersFailed - Total number of containers
    ContainersKilled - Total number of containers
    ContainersIniting - Total number of containers
    ContainersRunning - Total number of containers
    AllocatedContainers - Total number of containers
    ContainerLaunchDurationAvgTime ms Average container launch time
    ContainerLaunchDurationNumOps - Number of container launches
    AvailableVCores - Number of CPU cores
    AllocatedVCores - Number of CPU cores
    AllocatedGB GB Memory size
    AvailableGB GB Memory size
    ProcessCpuLoad % CPU utilization
    ProcessCpuTime ms Cumulative CPU usage time
    MaxFileDescriptorCount - Number of file descriptors
    OpenFileDescriptorCount - Number of file descriptors
    Uptime s Thread run time
    DaemonThreadCount - Number of worker threads
    ThreadCount - Number of worker threads

    HBase Monitoring Metrics

    HBase - overview

    Metric Name Unit Description
    ritCount - Number of cluster regions in RIT state
    ritCountOverThreshold - Number of cluster regions in RIT state
    ritOldestAge ms Cluster RIT time
    averageLoad - Average number of regions per RS
    numRegionServers - Number of cluster RSs
    numDeadRegionServers - Number of cluster RSs
    receivedBytes Bytes/s Number of cluster reads/writes
    sentBytes Bytes/s Number of cluster reads/writes
    clusterRequests Requests/s Total number of requests in cluster
    Assign_num_ops - Cluster assignment manager operations
    BulkAssign_num_ops - Cluster assignment manager operations
    BalancerCluster_num_ops - Number of cluster load balancing operations
    mergePlanCount - Cluster plans
    splitPlanCount - Cluster plans

    HBase - HMaster

    Metric Name Unit Description
    YGC - GC count
    FGC - GC count
    FGCT s GC time
    GCT s GC time
    YGCT s GC time
    S0 % Memory area proportion
    E % Memory area proportion
    CCS % Memory area proportion
    S1 % Memory area proportion
    O % Memory area proportion
    M % Memory area proportion
    LogFatal - Number of JVM logs
    LogError - Number of JVM logs
    LogWarn - Number of JVM logs
    LogInfo - Number of JVM logs
    MemNonHeapUsedM MB JVM memory
    MemNonHeapCommittedM MB JVM memory
    MemNonHeapMaxM MB JVM memory
    MemHeapUsedM MB JVM memory
    MemHeapCommittedM MB JVM memory
    MemHeapMaxM MB JVM memory
    MemMaxM MB JVM memory
    ThreadsNew - Number of JVM threads
    ThreadsRunnable - Number of JVM threads
    ThreadsBlocked - Number of JVM threads
    ThreadsWaiting - Number of JVM threads
    ThreadsTimedWaiting - Number of JVM threads
    ThreadsTerminated - Number of JVM threads
    numOpenConnections - Number of RPC connections
    FailedSanityCheckException - Number of RPC exceptions
    NotServingRegionException - Number of RPC exceptions
    OutOfOrderScannerNextException - Number of RPC exceptions
    RegionMovedException - Number of RPC exceptions
    RegionTooBusyException - Number of RPC exceptions
    UnknownScannerException - Number of RPC exceptions
    numCallsInPriorityQueue - Number of RPC queue requests
    numCallsInReplicationQueue - Number of RPC queue requests
    masterActiveTime s Process start time
    masterStartTime s Process start time

    HBase - RegionServer

    Metric Name Unit Description
    YGC - GC count
    FGC - GC count
    FGCT s GC time
    GCT s GC time
    YGCT s GC time
    S0 % Memory area proportion
    E % Memory area proportion
    CCS % Memory area proportion
    S1 % Memory area proportion
    O % Memory area proportion
    M % Memory area proportion
    LogFatal - Number of JVM logs
    LogError - Number of JVM logs
    LogWarn - Number of JVM logs
    LogInfo - Number of JVM logs
    MemNonHeapUsedM MB JVM memory
    MemNonHeapCommittedM MB JVM memory
    MemNonHeapMaxM MB JVM memory
    MemHeapUsedM MB JVM memory
    MemHeapCommittedM MB JVM memory
    MemHeapMaxM MB JVM memory
    MemMaxM MB JVM memory
    ThreadsNew - Number of JVM threads
    ThreadsRunnable - Number of JVM threads
    ThreadsBlocked - Number of JVM threads
    ThreadsWaiting - Number of JVM threads
    ThreadsTimedWaiting - Number of JVM threads
    ThreadsTerminated - Number of JVM threads
    averageRegionSize Byte Average region size
    regionCount - Number of regions
    percentFilesLocalSecondaryRegions % Region replicate localization
    authenticationFailures - Number of RPC authentications
    authenticationSuccesses - Number of RPC authentications
    numOpenConnections - Number of RPC connections
    FailedSanityCheckException - Number of RPC exceptions
    NotServingRegionException - Number of RPC exceptions
    OutOfOrderScannerNextException - Number of RPC exceptions
    RegionMovedException - Number of RPC exceptions
    RegionTooBusyException - Number of RPC exceptions
    UnknownScannerException - Number of RPC exceptions
    numActiveHandler - Number RPC handles
    numCallsInPriorityQueue - Number of RPC queue requests
    numCallsInReplicationQueue - Number of RPC queue requests
    numCallsInGeneralQueue - Number of RPC queue requests
    hlogFileCount - Number of WAL files
    hlogFileSize Bytes WAL file size
    memStoreSize MB Memstore size
    storeCount - Number of stores
    storeFileCount - Number of Storefiles
    storeFileSize MB Storefile size
    flushedCellsSize Bytes/s Disk write rate
    Append_mean ms Average delay
    Replay_mean ms Average delay
    Get_mean ms Average delay
    updatesBlockedTime ms Average delay
    FlushTime_num_ops - Number of RS disk writes
    splitQueueLength - Number of operation queue requests
    compactionQueueLength - Number of operation queue requests
    flushQueueLength - Number of operation queue requests
    Replay_num_ops - Number of Replay operations
    slowAppendCount - Number of slow operations
    slowDeleteCount - Number of slow operations
    slowGetCount - Number of slow operations
    slowIncrementCount - Number of slow operations
    slowPutCount - Number of slow operations
    splitRequestCount - Split requests
    splitSuccessCount - Split requests
    blockCacheCount - Number of cache blocks
    blockCacheHitCount - Number of cache blocks
    blockCacheMissCount - Number of cache blocks
    blockCacheExpressHitPercent % Cache read hit rate
    blockCacheSize Byte Memory size used by cache block
    staticBloomSize Bytes Index size
    staticIndexSize Bytes Index size
    storeFileIndexSize Bytes Index size
    receivedBytes bytes/s Read/write traffic
    sentBytes bytes/s Read/write traffic
    Total Requests/s Number of read/write requests
    Read Requests/s Number of read/write requests
    Write Requests/s Number of read/write requests
    Append_num_ops Requests/s Number of read/write requests
    mutationsWithoutWALCount - Number of mutations
    mutationsWithoutWALSize Bytes Mutation size
    regionServerStartTime s Process start time

    Hive Monitoring Metrics

    Hive - HiveMetaStore

    Metric Name Unit Description
    YGC - Young GC count
    FGC - Full GC count
    FGCT s Full GC time
    GCT s Garbage collection time
    YGCT s Young GC time
    S0 % Percentage of used Survivor 0 memory
    E % Percentage of used Eden memory
    CCS % Percentage of memory used by compressed class space
    S1 % Percentage of used Survivor 1 memory
    O % Percentage of used Old memory
    M % Percentage of used Metaspace memory

    Hive - HiveServer2

    Metric Name Unit Description
    YGC - Young GC count
    FGC - Full GC count
    FGCT s Full GC time
    GCT s Garbage collection time
    YGCT s Young GC time
    S0 % Percentage of used Survivor 0 memory
    E % Percentage of used Eden memory
    CCS % Percentage of memory used by compressed class space
    S1 % Percentage of used Survivor 1 memory
    O % Percentage of used Old memory
    M % Percentage of used Metaspace memory
    MemNonHeapUsedM MB Size of NonHeapMemory currently used by JVM
    MemNonHeapCommittedM MB Size of NonHeapMemory currently committed by JVM
    MemHeapUsedM MB Size of HeapMemory currently used by JVM
    MemHeapCommittedM MB Size of HeapMemory currently committed by JVM
    MemHeapMaxM MB Size of HeapMemory configured by JVM
    MemHeapInitM MB Size of initial JVM HeapMem
    MemNonHeapInitM MB Size of initial JVM NonHeapMem
    ProcessCpuLoad % CPU utilization
    MaxFileDescriptorCount - Maximum number of file descriptor
    OpenFileDescriptorCount - Number of opened file descriptors
    ProcessCpuTime ms Cumulative CPU usage time
    Uptime s Process run time
    DaemonThreadCount - Number of Daemon threads
    ThreadCount - Total number of threads

    Hive - HiveWebHcat

    Metric Name Unit Description
    YGC - Young GC count
    FGC - Full GC count
    FGCT s Full GC time
    GCT s Garbage collection time
    YGCT s Young GC time
    S0 % Percentage of used Survivor 0 memory
    E % Percentage of used Eden memory
    CCS % Percentage of memory used by compressed class space
    S1 % Percentage of used Survivor 1 memory
    O % Percentage of used Old memory
    M % Percentage of used Metaspace memory

    ZooKeeper Monitoring Metrics

    ZooKeeper

    Metric Name Unit Description
    YGC - Young GC count
    FGC - Full GC count
    FGCT s Full GC time
    GCT s Garbage collection time
    YGCT s Young GC time
    S0 % Percentage of used Survivor 0 memory
    E % Percentage of used Eden memory
    CCS % Percentage of memory used by compressed class space
    S1 % Percentage of used Survivor 1 memory
    O % Percentage of used Old memory
    M % Percentage of used Metaspace memory
    MemNonHeapUsedM MB Size of NonHeapMemory currently used by JVM
    MemNonHeapCommittedM MB Size of NonHeapMemory currently committed by JVM
    MemHeapUsedM MB Size of HeapMemory currently used by JVM
    MemHeapCommittedM MB Size of HeapMemory currently committed by JVM
    MemHeapMaxM MB Size of HeapMemory configured by JVM
    MemHeapInitM MB Size of initial JVM HeapMem
    MemNonHeapInitM MB Size of initial JVM NonHeapMem
    ProcessCpuLoad % CPU utilization
    MaxFileDescriptorCount - Maximum number of file descriptor
    OpenFileDescriptorCount - Number of opened file descriptors
    zk_max_file_descriptor_count - Maximum number of file descriptor
    zk_open_file_descriptor_count - Number of opened file descriptors
    ProcessCpuTime ms Cumulative CPU usage time
    Uptime s Process run time
    DaemonThreadCount - Number of Daemon threads
    ThreadCount - Total number of threads
    zk_num_alive_connections - Number of current connections
    zk_avg_latency ms Average delay in ZooKeeper processing
    zk_max_latency ms Maximum delay in ZooKeeper processing
    zk_min_latency ms Minimum delay in ZooKeeper processing
    zk_watch_count - Number of ZooKeeper watches
    zk_znode_count - Number of ZooKeeper znodes
    zk_ephemerals_count - Number of temporary ZooKeeper nodes
    zk_approximate_data_size Byte Volume of data stored in ZooKeeper
    zk_server_state 1: master, 0: slave, 2: single server ZooKeeper node type
    zk_packets_received Packets/s ZooKeeper package receiving rate
    zk_packets_sent Packets/s ZooKeeper package sending rate
    zk_outstanding_requests - Number of waiting requests

    Spark Monitoring Metrics

    SPARK - SparkJobHistory

    Metric Name Unit Description
    YGC - Young GC count
    FGC - Full GC count
    FGCT s Full GC time
    GCT s Garbage collection time
    YGCT s Young GC time
    S0 % Percentage of used Survivor 0 memory
    E % Percentage of used Eden memory
    CCS % Percentage of memory used by compressed class space
    S1 % Percentage of used Survivor 1 memory
    O % Percentage of used Old memory
    M % Percentage of used Metaspace memory

    Presto Monitoring Metrics

    Presto - overview

    Metric Name Unit Description
    Active - Number of active nodes
    Total - Total number of nodes
    Failed - Number of failed nodes
    RunningQueries - Total number of running queries
    QueuedQueries - Total number of waiting queries
    FailedQueries.OneMinute.Count Queries/min Total number of failed queries
    AbandonedQueries.OneMinute.Count Queries/min Total number of aborted queries
    CanceledQueries.OneMinute.Count Queries/min Total number of cancelled queries
    CompletedQueries.OneMinute.Count Queries/min Total number of completed queries
    StartedQueries.OneMinute.Count Queries/min Total number of started queries
    SubmittedQueries.OneMinute.Count Queries/min Total number of submitted queries
    InputDataSize.OneMinute.Rate GB/min Data input rate
    OutputDataSize.OneMinute.Rate GB/min Data output rate

    Presto - Worker

    Metric Name Unit Description
    YGC - Young GC count
    FGC - Full GC count
    FGCT s Full GC time
    GCT s Garbage collection time
    YGCT s Young GC time
    S0 % Percentage of used Survivor 0 memory
    E % Percentage of used Eden memory
    CCS % Percentage of memory used by compressed class space
    S1 % Percentage of used Survivor 1 memory
    O % Percentage of used Old memory
    M % Percentage of used Metaspace memory
    MemNonHeapUsedM MB Size of NonHeapMemory currently used by JVM
    MemNonHeapCommittedM MB Size of NonHeapMemory currently committed by JVM
    MemHeapUsedM MB Size of HeapMemory currently used by JVM
    MemHeapCommittedM MB Size of HeapMemory currently committed by JVM
    MemHeapMaxM MB Size of HeapMemory configured by JVM
    MemHeapInitM MB Size of initial JVM HeapMem
    MemNonHeapInitM MB Size of initial JVM NonHeapMem
    InputDataSize.OneMinute.Rate GB/min Data input rate
    OutputDataSize.OneMinute.Rate GB/min Data output rate
    PeakThreadCount - Peak number of threads
    ThreadCount - Number of threads
    DaemonThreadCount - Number of background threads
    Uptime s Process run time
    StartTime s Process start time
    MaxFileDescriptorCount - Maximum number of file descriptor
    OpenFileDescriptorCount - Number of opened file descriptors

    Presto - Coordinator

    Metric Name Unit Description
    YGC - Young GC count
    FGC - Full GC count
    FGCT s Full GC time
    GCT s Garbage collection time
    YGCT s Young GC time
    S0 % Percentage of used Survivor 0 memory
    E % Percentage of used Eden memory
    CCS % Percentage of memory used by compressed class space
    S1 % Percentage of used Survivor 1 memory
    O % Percentage of used Old memory
    M % Percentage of used Metaspace memory
    MemNonHeapUsedM MB Size of NonHeapMemory currently used by JVM
    MemNonHeapCommittedM MB Size of NonHeapMemory currently committed by JVM
    MemHeapUsedM MB Size of HeapMemory currently used by JVM
    MemHeapCommittedM MB Size of HeapMemory currently committed by JVM
    MemHeapMaxM MB Size of HeapMemory configured by JVM
    MemHeapInitM MB Size of initial JVM HeapMem
    MemNonHeapInitM MB Size of initial JVM NonHeapMem
    PeakThreadCount - Peak number of threads
    ThreadCount - Number of threads
    DaemonThreadCount - Number of background threads
    Uptime s Process run time
    StartTime s Process start time
    MaxFileDescriptorCount - Maximum number of file descriptor
    OpenFileDescriptorCount - Number of opened file descriptors

    Was this page helpful?

    Was this page helpful?

    • Not at all
    • Not very helpful
    • Somewhat helpful
    • Very helpful
    • Extremely helpful
    Send Feedback
    Help