Apache Nifi for Real-time scenarios in IoT – Part 1

In this two-part series, I will try to share why we have chosen Apache Nifi as a choice for an IoT Implementation for handling real-time data flow.

There was a need for near real-time data processing requirements for one of IoT project which has multiple integration touch points that’s when I was evaluating different options such as Kapacitor, Apache Storm, Apache Kafka. When I encountered Apache Nifi as a data flow engine which was used NSA was curious to explore. Initially thought it might be a complex attempt but seems to be an easy go once we started exploring. So before trying to share my use cases on when to use Apache Nifi from my own context, would try to quickly have an understanding what is a real-time data processing.

Real-Time Data Processing:

Typically stream of data flowing at very high response rate which needs to be processed for gaining insights. Though the term “real-time” itself would be subjective based on the context or usage. Typically, we need to process the data with zero latency.

Challenges:

The following were some of the challenges we were encountering in a typical IoT Implementation:

  1. Need to track the flow of data across the information value chain
  2. Once data is ingested into the processing flow there could be different data processing requirements such as:
    1. Validation
    2. Threshold Checks
    3. Initiating business events
    4. Alerting
    5. Aggregation
  3. Need to make sure that data flow is seamless and if there are problems it could be isolated without impacting each other
  4. Enable handle different protocol such as MQTT, JSON, HTTP
  5. Integration requirements through API, validation needs with Regular Expression
  6. Need to handle DB Operations on the way of data flow was also a key requirement
  7. The performance needs to be optimal to manage flow requirements.
  8. Need for parallelism required across different data flow points was also a key aspect into our considerations

Needless to say that there were other constraints on resources such as Time and People. In the next part we will discuss what is Apache Nifi and how it handles these challenges.

Text Mining: Intro, Tools and References

What is it?

In simple terms retrieve quality information from the text for analysis.

Where it can be used?

  1. Analysis of emails, messages, etc.,
  2. Analysis of open-ended surveys
  3. Analysis of claims for fraud detection
  4. Investigation by crawling
  5. Spam filtering
  6. Labeling for Machine learning
  7. Recommendations engine

Various Stages of Text Mining:

Good tools for Text Mining (free J):

  • R Programming (refer to the tm package)
  • Gensim (Python library for analyzing plain text)
  • Gate (Open Source library for Text Processing 15-Year old)

Good References:

Where to get started: http://tedunderwood.com/2012/08/14/where-to-start-with-text-mining/

http://www.statsoft.com/textbook/text-mining/

http://rapid-i.com/component/option,com_myblog/show,Great-Video-Series-about-Text-Mining.html/Itemid,172/