I was venturing to a new research on chatbots where I ended up with putting efforts on understanding NLTK which is a Natural Language Processing Toolkit for Python. This toolkit helps to simplify the efforts related to NLP processing. There are excellent Youtube video tutorials on NLTK which you can look one by sentdex which has dealt a lot of sentiment analysis right from installation.
In this post I will attempt to share my thoughts on how we can use this NLTK to solve different use cases.
Recommendation: Recommendation of content can be made based on the similarity. Against similarity can be calculated based on semantic similarity and lexical similarity. Lot of more can be explored such as cosine similarity.
Sentiment Analysis: Sentiment analysis can be used to determine the authors attitude on the content based on the good words dictionary by adopting simple scoring techniques.
N-Gram Analysis: By tokenizing the content we can analyze the content in large text for large text analysis.
NLTK is a very powerful tool, which can be used for extensive programming pertaining to natural text. It also has package called nltk.chat which could be used for building chatbots.
In this two-part series, I will try to share why we have chosen Apache Nifi as a choice for an IoT Implementation for handling real-time data flow.
There was a need for near real-time data processing requirements for one of IoT project which has multiple integration touch points that’s when I was evaluating different options such as Kapacitor, Apache Storm, Apache Kafka. When I encountered Apache Nifi as a data flow engine which was used NSA was curious to explore. Initially thought it might be a complex attempt but seems to be an easy go once we started exploring. So before trying to share my use cases on when to use Apache Nifi from my own context, would try to quickly have an understanding what is a real-time data processing.
Real-Time Data Processing:
Typically stream of data flowing at very high response rate which needs to be processed for gaining insights. Though the term “real-time” itself would be subjective based on the context or usage. Typically, we need to process the data with zero latency.
The following were some of the challenges we were encountering in a typical IoT Implementation:
- Need to track the flow of data across the information value chain
Once data is ingested into the processing flow there could be different data processing requirements such as:
- Threshold Checks
- Initiating business events
- Need to make sure that data flow is seamless and if there are problems it could be isolated without impacting each other
- Enable handle different protocol such as MQTT, JSON, HTTP
- Integration requirements through API, validation needs with Regular Expression
- Need to handle DB Operations on the way of data flow was also a key requirement
- The performance needs to be optimal to manage flow requirements.
- Need for parallelism required across different data flow points was also a key aspect into our considerations
Needless to say that there were other constraints on resources such as Time and People. In the next part we will discuss what is Apache Nifi and how it handles these challenges.