Step by Step Sentiment analysis on Twitter data using R with Airtel Tweets: Part – III

After lot of difficulties my 3rd post on this topic in this weekend. In my first post we saw what is sentiment analysis and what are the steps involved in it. In my previous post we saw how to retrieve the tweets and store it in the File step by step. Now we will move on to the step of Sentiment analysis.

Goal: To do sentiment analysis on Airtel Customer support via Twitter in India.

In this Post: We will retrieve the Tweets which are retrieved and stored in the previous post and start doing the analysis. In this post I’m going to use the simple algorithm as used by Jeffrey Breen to determine the scores/moods of the particular brand in twitter.

We will use the opinion lexicon provided by him which is primarily based on Hu and Liu papers. You can visit their site for lot of useful information on sentiment analysis. We can determine the positive and negative words in the tweets, based on which scoring will happen.

Step 1: We will import the CSV file into R using read.csv and you can use the summary to display the summary of the dataframe.

Step 2: We can load the Positive words and Negative words and store it locally and can import using Scan function as given below:


Step 3:

Now we will look at the code for evaluating the Sentiment. This has been taken from http://jeffreybreen.wordpress.com/2011/07/04/twitter-text-mining-r-slides/. Thanks for the source code by Jeffrey.


Step 4:

We will test this sentiment.score function with some sample data.


In this step we have created test and added 3 sentences to it. This contains different words which may be positive or negative. Pass this “Test” to the score.sentiment function with pos_words and neg_words which we have loaded in the previous tests. Now you get the result score from the score.sentiment function against each sentence.

we will also try understand little more about this function and what it does:

a. Two libraries are loaded they are plyr and stringr. Both written by Hadley Wickham one of the great contributor to R. You can also learn more about plyr using this page or tutorial. You can also get more insights on split-apply-combine details here best place to start according to Hadley Wickham. You can think of it on analogy with Map-Reduce algorithm by Google which is used more in terms of Parallelism. stringr makes the string handling easier.

b. Next laply being used. You can learn more on what apply functions do here. In our case we pass on the sentences vector to the laply method. In simple terms this method takes each tweet and pass on to the function along with Positive and negative words and combines the result.

c. Next gsub helps to handle the replacements with the help using gsub(pattern, replacement, x).

d. Then convert the sentence to lowercase

e. Convert the sentences to words using the split methods and retrieve the appropriate scores using score methods.

Step 5: Now we will give the tweetsofaritel from airteltweetdata$text to the sentiment function to retrieve the score.

Step 6: We will see the summary of the scores and its histogram:

The histogram outcome:

It shows the most of the response out of 1499 is negative about airtel.

Disclaimer: Please note that this is only sample data which is analyzed only for the purpose of educational and learning purpose. It’s not to target any brand or influence any brand.

Advertisements

5 thoughts on “Step by Step Sentiment analysis on Twitter data using R with Airtel Tweets: Part – III

  1. Hey, Nice blog. I have also been doing similar stuff using Twitter data. One limitation I have come across is that, while gathering data using TwitteR package, there is no reliability in terms of how many tweets one is able to retrieve. At Times I get 6000 tweets at other times I get only 200 tweets. And this is observed for even the same query. This makes it problematic to do ongoing analysis especially for trending topics. (I have tried to do sentiment analysis of movies). Do you have any idea about getting around this issue?

  2. Hi,

    I recently started following your blog and found it very interesting and its really helpful. Nice and amazing work.
    Some help:
    I am performing the same procedure for the old historic data from 2007 to 2011 and I am getting an error in that.

    Actually when we load the data i.e in Summary only I am able to find out the difference may be because of that. Do you have any idea why I cant see the summary as your summary.

    P.S I used R to extract the historical data.

    Thanks,

    Karthik

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s