Apache NiFi Cluster in Detail

What’s a cluster?

To learn about Apache NiFi Cluster in Detail first you should understand about cluster. A cluster is a set of loosely or tightly connected computers. These connected computers work together so that, in many respects, we view them as a single system.

If you are a video lover then you should watch this video from my Youtube Channel else scroll down to continue reading.

Why do we need a cluster system?

  1. To understand why do we need a cluster, you have to understand the limitations of a single server or system.
  2. A single system or single server is basically a single point of failure. If you lose the server, you lose all. This means you are losing business.
  3. One server is not big enough to process the amount of data you have. If want to do move huge volume of data at a very high rate continuously then having one system is not enough.
  4. Administrating multiple single systems are complex than managing a cluster. e.g. When your code changes, you have to deploy that code individually to all the servers individually. This increases the chances of missing important things and makes system error prone.

What are the benefits of a NiFi cluster?

  1. Increases system resources which simply means that you have more processing power to perform complex and continuous tasks.
  2. Option to scale when you need more processing power. You can just add nodes and make NiFi cluster more powerful.
  3. Single User Interface to manage your data flows. This makes job of a data flow manager very easy.
  4. Monitoring multiple servers from a single interface, makes troubleshooting easier.

Key Terminologies:

Nodes

  • Each cluster is made up of one or more nodes. The nodes do the actual data processing.

Cluster Coordinator

  • An instance of NiFi that provides a single management point for the cluster.
  • Receives health and status information from the nodes.
  • Communicates dataflow changes to the nodes.
  • Maintains uniformity of dataflow across the cluster.

Heartbeats

  • The nodes communicate their health and status to the currently elected Cluster Coordinator via “heartbeats”, which let the Coordinator know they are still connected to the cluster and working properly. 
  • By default, nodes send heartbeats every 5 seconds, this is configuration and can be changed in nifi.properties
  • If no heartbeat is received from any of the nodes, then that node will get disconnected. If that node starts sending heartbeat again, cluster coordinator will send request to the node to rejoin the cluster.

Isolated Processors

  • An Isolated processor which is going to run only on the primary node. Typically used in GetSFTP kind of scenarios.
  •  

 

Primary Node

  • Every cluster has one and only one Primary Node.
  • On this node, it is possible to run “Isolated Processors” (see below).
  • ZooKeeper is used to automatically elect a Primary Node. If that node disconnects from the cluster for any reason, a new Primary Node will automatically be elected.
  • Users can determine which node is currently elected as the Primary Node by looking at the Cluster Management page of the User Interface.
Primary Node in Cluster Management UI

Short and sweet overview of NiFi Cluster

Nifi cluster Key Features

  • NiFi employs a Zero-Master Clustering paradigm.
  • Each Node performs the same task but operates on the different set of data. For example, if you have used nifi CountText processor to count the number of lines in a file, then the processor is going to be executed on all the nodes of a cluster however it will be operating on a different file which is received on that node.
  • Zookeeper elects a cluster coordinator node. This process is automatic taken care by Zookeeper.
  • When a new node joins the cluster, it must first connect to cluster coordinator to get the latest data flow.

Scroll to Top