SVM Implementation step by step with R: Data Preparation

In this post, we will try to implement SVM with the e1071 package for a Ice-cream shop which has recorded the following attributes on sales:

  • The temperature in the city
  • Sales on a particular day
  • Labeling whether its “Good” or “Bad” sales.

Steps:

  1. Lets install the necessary packages using the command

    install.packages(‘e1071’,dependencies=TRUE)

  2. It will ask for the CRAN mirror, choose the one nearest to your country. Subsequently you will see the message that binary packages has been installed in the specific path. Please note that I’m using RGui(32-bit) windows xp version.

3.  To start using this library you can issue the following command, I got a warning message that it was built for version 2.15.3, you can upgrade R to avoid this message:

    library(e1071)

4. I have the Ice-Cream parlor sales data in a excel workbook. You check with my earlier post on importing excel workbook with R for importing data or you can convert the excel to CSV format and read it using read.csv() method.

5. I will use the later one as given below:

6. Now we have the necessary data and you can see the columns read as “SalesRating”, “CityTemperature”, “IceCreamSales”.

7. I’m assigning the data from the CSV file to a dataset like the following

    dataset<-read.csv("data.csv")

8. We will use the 70% of the data for Training Dataset and 30% for Testing Dataset. Ideally we are going to subset a larget dataset. The first step towards that is creating a index, like the one given below to determine the index from the 1st to the nth row of the dataset:

    index<-1:nrow(dataset)

9. If you would like to see what exists in the index, just try to console it out. Next we are going to create testindex to sample out the 30% of the dataset using the following commands

    testindex<-sample(index,trunc(length(index)*30/100))

10. Now we need to segregate the testdataset and trainingdataset using the testindex we have create given below
061513_0022_SVMImplemen3.png
11. Now we will output testset and trainingset summary which will give you an idea of how it has got segregated:

So far we have seen the steps in preparing the data for analysis using SVM which has 44 TestSet records and 105 TrainingSet records, in the next post we will see the SVM Process.

Advertisements