Step by Step connecting to MySQL from R

Many of you would like to explore the data from the MySql database with R. This would help to analyze data with relationship which exists in R. This post will talk about the steps to connect to MySQL from R. This post is a step by step approach similar to the to the steps outlined in this PDF Document, though there are some additional information and easy to follow sequence.

Assumptions:

  1. MySQL is already installed on your System
  2. Operating System is Windows 7 (64-Bit)

Dataset:

We will use the Employee dataset which is available in the http://www.eclipse.org/birt/phoenix/db/ which can be used for testing applications and database with the fictious names “classicmodels”. (approximately 3.1 MB). Make sure you import the dataset to MySQL as outlined here. It has the following tables Customers, Employees, Orders, OrderDetails, Payments, Products and Product Lines. You can look at what is available in the database using the SQLYOG Community Edition.

Various ways to connect to MySQL from R:

There following are the ways in which we can connect to MySQL from R:

  • Using RODBC Library
  • Using RMySQL Library

In this post we will see the steps for connecting to MySQL from R using ODBC. Also refer to this PDF Attachment on the ODBC connectivity with R.

Step 1:

Download the ODBC Driver from the site https://dev.mysql.com/downloads/connector/odbc/ to make sure if you have the driven if you don’t have one. I have downloaded mysql-connector-odbc-5.3.2-winx64.msi as My operating and System is 64-bit. Please proceed with the installation of the same.

ODBC Setup:

Going forward we will do the ODBC setup related steps.

Step 2:

Goto Control Panel->Administrative Tools->ODBC


Step 3: Click Add to add a new ODBC Setup for MySQL


 

Step 4: Once you click finish you will get the below given screen where in you need to feed the IP Address/Hostname of the MySQL Server, Username and Password credentials and database and click on the Test button to make sure you are able to connect to the database without any problems. Once the Test is successful click OK to added to the List of ODBC connections.


After the MySQL ODBC connection is added:


 

Now we will move to R to invoke this Datasource and try to access any one of the table in R

Step 5: Now load the library RODBC using the following command, if its not getting loaded you can look at the instructions in the link to install the RODBC Library.

library(RODBC)


Step 6: Now having the RODBC installed now we can connect to the ClassicModels database in the MySQL and test the RODBC Library

Hope now we have got an idea of how to bring on the MySQL database to R for further processing, in the next post we will try to attempt of using RMySQL Library. Thanks, please share your feedback.

Market Basket Analysis with Hadoop: Importing mysql data to Hive using SQOOP

Now we have an existing data warehouse which is in MySql now we will need the following tables, which are Product and sales fact tables for the year 1997 and 1998. We will take steps to import this to HDFS for the further analysis using Hive. Please go through previous blog post of understanding how to establish connectivity with MySql using Sqoop.

We can start import using the following statement:

sqoop import –connect jdbc:mysql://192.168.1.10/foodmart –table product –username root

  • Now you can see that it has imported the data to HDFS in 50.2663 seconds which is at 3.0000 KB/sec. If you issue the command hadoop dfs –ls it will show a item added /user/hduser/product

Subsequent query with hadoop dfs –ls /user/hduser/product reveals the following:

Since we will use hive to analyze the data, we will import the data again to hive using –hive-import option, but if we do that the following sequence of things will happen:

  1. First step is the data will be imported to HDFS
  2. Sqoop generates hive scripts to load the data from the hdfs to hive.

So, we would need to remove the product folder which is imported to HDFS through the Sqoop as it will find the folder exists while its trying to import to hive. So we will remove the same using the following statement:

hadoop dfs -rmr /user/hduser/product

Now we will import the data using Sqoop using the hive option:

sqoop import –connect jdbc:mysql://192.168.1.10/foodmart –table product –username root –hive-import

Once the import is complete you will see something like the below:

Now we will go ahead and check the data in hive by using show tables and describe product:

In my next post we will import the remaining table to be used for market basket analysis and start querying with hive.

Connecting to Mysql via SQOOP – List Tables

In continuation to my posts on the Market basket analysis, I would continue my next steps towards the analytics using the data available in the FoodMart Dataset which you can download from this url https://sites.google.com/a/dlpage.phi-integration.com/pentaho/mondrian/mysql-foodmart-database.  Before moving on to next steps its important that we understand certain things with respect to connecting to mysql from Sqoop as we are focusing on big data as retail is always big. Here are the steps..

    1. Please download the JDBC driver for Sqoop to interact with MySQL using the following URL : http://dev.mysql.com/downloads/connector/j/
    2. Make sure you downloaded the mysql-connector-java-5.1.25.tar.gz either using wget or you can download it from your windows machine if your connected with VirtualBox or VMWare.
    3. Then extract the files to get the mysql-connector-java-5.1.25-bin.jar file and place under sqoop/lib folder
    4. Make sure you have the necessary mysql server information like hostname, username and password with necessary access.
    5. Once you have got that make you have provided necessary privileges for other host to access the mysql server using the following statement:

grant all privileges on *.* to ‘username’@’%’ identified by ‘userpassword’;

  • Then you can get the list of tables from the mysql database foodmart using the following command:

sqoop list-tables –connect jdbc:mysql://192.168.1.32:3306/foodmart -username root

Note: I have done this experiment with Sqoop version 1.4.3, Ubuntu 12.0.4 LTS on Virtualbox and mysql 5.5.24 with WAMP.

Caution: In my example I have used root as the username please don’t use the root username.

Other Links for your references:

http://www.devx.com/Java/hadoop-sqoop-to-import-data-from-mysql.html

http://www.datastax.com/docs/datastax_enterprise2.0/sqoop/sqoop_demo