Step by Step connecting to MySQL from R

Many of you would like to explore the data from the MySql database with R. This would help to analyze data with relationship which exists in R. This post will talk about the steps to connect to MySQL from R. This post is a step by step approach similar to the to the steps outlined in this PDF Document, though there are some additional information and easy to follow sequence.

Assumptions:

  1. MySQL is already installed on your System
  2. Operating System is Windows 7 (64-Bit)

Dataset:

We will use the Employee dataset which is available in the http://www.eclipse.org/birt/phoenix/db/ which can be used for testing applications and database with the fictious names “classicmodels”. (approximately 3.1 MB). Make sure you import the dataset to MySQL as outlined here. It has the following tables Customers, Employees, Orders, OrderDetails, Payments, Products and Product Lines. You can look at what is available in the database using the SQLYOG Community Edition.

Various ways to connect to MySQL from R:

There following are the ways in which we can connect to MySQL from R:

  • Using RODBC Library
  • Using RMySQL Library

In this post we will see the steps for connecting to MySQL from R using ODBC. Also refer to this PDF Attachment on the ODBC connectivity with R.

Step 1:

Download the ODBC Driver from the site https://dev.mysql.com/downloads/connector/odbc/ to make sure if you have the driven if you don’t have one. I have downloaded mysql-connector-odbc-5.3.2-winx64.msi as My operating and System is 64-bit. Please proceed with the installation of the same.

ODBC Setup:

Going forward we will do the ODBC setup related steps.

Step 2:

Goto Control Panel->Administrative Tools->ODBC


Step 3: Click Add to add a new ODBC Setup for MySQL


 

Step 4: Once you click finish you will get the below given screen where in you need to feed the IP Address/Hostname of the MySQL Server, Username and Password credentials and database and click on the Test button to make sure you are able to connect to the database without any problems. Once the Test is successful click OK to added to the List of ODBC connections.


After the MySQL ODBC connection is added:


 

Now we will move to R to invoke this Datasource and try to access any one of the table in R

Step 5: Now load the library RODBC using the following command, if its not getting loaded you can look at the instructions in the link to install the RODBC Library.

library(RODBC)


Step 6: Now having the RODBC installed now we can connect to the ClassicModels database in the MySQL and test the RODBC Library

Hope now we have got an idea of how to bring on the MySQL database to R for further processing, in the next post we will try to attempt of using RMySQL Library. Thanks, please share your feedback.

Data Visualization: Scatter Plot step by step with R

In continuation to my previous post on scatter plot which was more of an introduction, in this post we will see step by step approach on doing scatter plot with R. Our objective in this post would be to draw a scatter plot using R step by step.

Background:

The following table provides the data which will be used in this post. This has two columns one is “Year” and another one is “Total Telephones” which is in millions.

 Source: http://mospi.nic.in/Mospi_New/upload/SYB2014/ch31.html

 Source Data:


Steps:

Step 1: We have this data in a CSV file named “ScatterPlotData.csv”

Step 2: We will load this data using load.csv method using mydata=read.table(“E:\\Personal\\Learning\\Data Visualization\\ScatterPlotData.csv”,header=TRUE). This will handle the header and load the data appropriately.

Step 3: Now we have the data in mydata

Step 4: Now we can plot the Graph using the following statement :

mydata=read.table("E:\\Personal\\Learning\\Data Visualization\\ScatterPlotData.csv",header=TRUE)
plot(mydata$Year,mydata$TotalTelephones,main="Year Vs Sale of Telephones",xlab="Year",ylab="Sale of Telephones(Millions)",xaxt="n",ann="True")
axis(1, at=1:length(mydata$Year), lab=c(2004,2005,2006,2007,2008,2009,2010,2011,2012,2013))

Step 5: The Result

Step by Step Sentiment analysis on Twitter data using R with Airtel Tweets: Part – III

After lot of difficulties my 3rd post on this topic in this weekend. In my first post we saw what is sentiment analysis and what are the steps involved in it. In my previous post we saw how to retrieve the tweets and store it in the File step by step. Now we will move on to the step of Sentiment analysis.

Goal: To do sentiment analysis on Airtel Customer support via Twitter in India.

In this Post: We will retrieve the Tweets which are retrieved and stored in the previous post and start doing the analysis. In this post I’m going to use the simple algorithm as used by Jeffrey Breen to determine the scores/moods of the particular brand in twitter.

We will use the opinion lexicon provided by him which is primarily based on Hu and Liu papers. You can visit their site for lot of useful information on sentiment analysis. We can determine the positive and negative words in the tweets, based on which scoring will happen.

Step 1: We will import the CSV file into R using read.csv and you can use the summary to display the summary of the dataframe.

Step 2: We can load the Positive words and Negative words and store it locally and can import using Scan function as given below:


Step 3:

Now we will look at the code for evaluating the Sentiment. This has been taken from http://jeffreybreen.wordpress.com/2011/07/04/twitter-text-mining-r-slides/. Thanks for the source code by Jeffrey.


Step 4:

We will test this sentiment.score function with some sample data.


In this step we have created test and added 3 sentences to it. This contains different words which may be positive or negative. Pass this “Test” to the score.sentiment function with pos_words and neg_words which we have loaded in the previous tests. Now you get the result score from the score.sentiment function against each sentence.

we will also try understand little more about this function and what it does:

a. Two libraries are loaded they are plyr and stringr. Both written by Hadley Wickham one of the great contributor to R. You can also learn more about plyr using this page or tutorial. You can also get more insights on split-apply-combine details here best place to start according to Hadley Wickham. You can think of it on analogy with Map-Reduce algorithm by Google which is used more in terms of Parallelism. stringr makes the string handling easier.

b. Next laply being used. You can learn more on what apply functions do here. In our case we pass on the sentences vector to the laply method. In simple terms this method takes each tweet and pass on to the function along with Positive and negative words and combines the result.

c. Next gsub helps to handle the replacements with the help using gsub(pattern, replacement, x).

d. Then convert the sentence to lowercase

e. Convert the sentences to words using the split methods and retrieve the appropriate scores using score methods.

Step 5: Now we will give the tweetsofaritel from airteltweetdata$text to the sentiment function to retrieve the score.

Step 6: We will see the summary of the scores and its histogram:

The histogram outcome:

It shows the most of the response out of 1499 is negative about airtel.

Disclaimer: Please note that this is only sample data which is analyzed only for the purpose of educational and learning purpose. It’s not to target any brand or influence any brand.

Step by Step Sentiment analysis on Twitter data using R with Airtel Tweets: Part – II

In the previous post we saw what is sentiment analysis and what are the steps involved in it. In this post we will go through step by step instruction on doing Sentiment Analysis on the Micro blogging site “Twitter”. We will have specific objective to do so. I came across an interesting post by Chetan S on the DTH operators involvement in using Social Media for providing customer support. It triggered me the idea for this post.

Goal: To do sentiment analysis on Airtel Customer support via Twitter in India.

In this Post: We will retrieve the Tweets, look at how to access the Twitter API and make best use of the TwitteR R package and write these tweets to a file.

Important Note:

1. when you would like to use the searchTwitter, go to dev.twitter.com and your application go to the “Settings” tab and select “Read, Write and Access direct messages”. Make sure to click on the save button after doing this.”

Refer to this link http://stackoverflow.com/questions/15713073/twitter-help-unable-to-authorize-even-with-registering

2. When you are trying to search using searchTwitter after the above step if you get ssl problem make sure you have enable rCurl and do the steps outline here: http://stackoverflow.com/questions/15347233/ssl-certificate-failed-for-twitter-in-r.

options(RCurlOptions = list(cainfo = system.file(“CurlSSL”, “cacert.pem”, package = “RCurl”))) also make sure you have loaded the Necessary Packages like ROAuth,

Step 1: Make sure you have done the OAuth authentication with Twitter using the Previous post and the steps outlined above, you can also check the library loaded with sessionInfo(). Step 2: Make sure you load the tweets from the Twitter from the Twitter Handle accordingly > airtel.tweets=searchTwitter(“@airtel_presence”,n=1500) Now we have loaded the 1499 tweets which was responded by the Twitter API in to airtel.tweets. Now what we will do is to save these to a file for future processing. Step 3: Before we write these tweets to a file, for better understanding we will try to look at some of the tweets and data collected so far. head(airtel.tweets) provides the top 6 tweets. Further to our analysis, we try to get the length of the tweets, what kind of class it is and how can we access the tweets. Look at the below given screenshot. Step 4: We will look at some examples of How to access the twitter data in a better fashion with respect to the Twitter API using TwitteR library by accessing one tweets from the 1499 available. In this above given example we have selected the 3rd item from the list and we have tried to get till the user information, how many friends he has and how many followers he has, etc., These are the things which are vital to understand as these factors can become viral and impact the image of a particular brand. Now will go to the next step of identifying the steps to store these tweets for further analysis. Step 5: We will store these tweets we collected in airtel.tweets to a file for future analysis and reference. We are going to convert the list of tweets to separate data using apply functions and write to a file. We are going to use the library plyr for the same. Plyr allows the user to split a data set apart into smaller subsets, apply methods to the subsets, and combine the results. Please click here for detailed introduction on plyr. So we are converting the list to data frame for preparing it to be written to a file. Now the tweets and all the necessary information is available in the tweets.df data frame. You can look at the below screenshots for its summary. Step 6: Setup the Working directory and write the tweets.df data frame to the file airteltweets.csv. You can verify the data available in this file using Notepad++ or Excel In the next post we will look at how to do sentiment analysis with this file data.

Sentiment Analysis on Twitter data using R: Part – I

Now, the past posts we have understood the importance of using Twitter API, Basics of Twitter API and how we can access the Twitter API using R. Now we will get into analytics of how to do sentiment analysis with R with the library TwitteR. Before we do that we will try to do little understanding of Sentiment Analysis(some times also called as opinion mining) in a Q & A Format.

What is Sentiment Analysis ?

In simple words, Sentiment analysis is the task of identifying whether the opinion expressed in a text is positive or negative in general about a particular topic or context.

Can we have some examples?

a. I’m in a happy mood today, I go to beach. – Positive

b. I very much like R and its capabilities – Positive

c. I don’t like SPSS, its very complex to use – Negative

d. I feel Rapid Miner is easy to use and has good interface – Positive

Where it is being used or what are its applications?

With the lot of micro-blogging platforms available and business are well placed there, it’s important to understand that sentiment analysis on those platforms help understand the problems and feel of the customers.

  • Understanding customer feedback received
  • To arrive at happiness index of the customers
  • Determining product recommendations
  • Predicting Stock market moods
  • As a competitive marketing tool

Steps for Sentiment Analysis?

How do we do the Practical implementation?

http://www.slideshare.net/jeffreybreen/r-by-example-mining-twitter-for (The one I Like the most).

http://stackoverflow.com/questions/10233087/sentiment-analysis-using-r

https://sites.google.com/site/miningtwitter/questions/sentiment

http://trestletechnology.net/2011/11/sermon-sentiment-analysis/

You can look at the above given references for the practical implementation of Sentiment Analysis. Some of them may be outdated, in the next post we will do a practical step by step implementation of Sentiment Analysis with Twitter data using R.

Getting started with TwitteR Package

The intention of this blog post is to give you start on using the TwitteR Package of R. Using this package you can do lot of analysis on social media “Twitter”. I have written an post on analyzing a Cricketer’s Century Tweets and also the need for analyzing tweets already in my blog.

Pre-Requisite tools & Environment:

We are going to explore this completely with Windows 7 and R.

Steps to follow:

Step 1:We need to use the TwitteR package and ROAuth package for accessing the tweets. As per the recommendation from Twitter its always safe to access the tweets via SSL. First we will see the code for the same.

#install the basic packages

install.packages(“ROAuth”)

install.packages(“twitteR”)

#Initiate/Invoke the libraries

library(“ROAuth”)

library(“twitteR”)

#necessary step for Windows to handle the SSL Part

download.file(url=”http://curl.haxx.se/ca/cacert.pem”, destfile=”cacert.pem”)

Step 2: Use the OAuthFactory to setup the Credentials and start accessing data in the following way

cred <- OAuthFactory$new(consumerKey=’azbiz8LbVeA0lBUVh3c6lA‘,

consumerSecret=’Sq5kNMbdYoxNc616urV1Ayi0rKizwePRg2tDkIUEk‘,

requestURL=’https://api.twitter.com/oauth/request_token&#8217;,

accessURL=’http://api.twitter.com/oauth/access_token&#8217;,

authURL=’http://api.twitter.com/oauth/authorize&#8217;)


After this you can notice that “handshakeComplete” is FALSE. We need to complete the handshake to get access to the TwitterAPI and its data.

Step 3: Create a handshake with twitter, for which you will get a message like the following:

To enable the connection, please direct your web browser to:
http://api.twitter.com/oauth/authorize?oauth_token=ZS2khFL8LZmd4XZ92yeCjcchX08E80g3uzUucv6ds
When complete, record the PIN given to you and provide it here: install.packages("ROAuth")
Error: Unauthorized

Once you naviage to the URL you will get a PIN which you should type in the R Console. Now you can see that we have enterered the PIN from the browser after authorizing the Application.

You can also realize that now the “handshakeComplete” has become TRUE.

Step 4: Verify the status of OAuth authentication using the following command and it should return TRUE.

registerTwitterOAuth(cred)


Step 5: Now the next step is to start accessing the data using TwitterAPI. Let’s try to get started with accessing the User Information.

userInfo<-getUser(“seesiva”, cainfo=”cacert.pem”)

You need to make sure that you also pass the cainfo otherwise you will get an SSL Error.


Hope now we understand the steps required for accessing the Twitter data using the TwitteR Package. In this example we had shown the various attributes of the User Object retrieval. In the next post we will try to analyze some data.

Similar posts for your reference:

http://yourwhatyourepeatedlydo.blogspot.in/2013/04/downloading-twitter-data-using-r.html

http://davetang.org/muse/2013/04/06/using-the-r_twitter-package/

Quick Intro to Twitter API by Question and Answers

First of all what is API?

API is Application Programming Interface.

What is Twitter API?

Twitter API is an API which provides access or be the gateway to the Twitter Data through RESTful service. Using this Twitter API developers or programmers will be able to develop applications which can interact with twitter data.

Where will I get more information about this API?

Please navigate to dev.twitter.com

Where can I find the API FAQ or any other useful Info?

https://dev.twitter.com/docs/faq

What are the different ways we can implement twitter API for analytics?

There are many references provided by twitter, but In my blog posts primarily we will deal with the only following ways:

using JUNG – Java universal Network Graph Framework

using TwitterR – Twitter Analytics Package for R