SVM Implementation Step by Step with R: Ice-cream sales prediction

In the previous blog post we saw till the steps of collecting data from the CSV file to segregate the 30% of data for test and remaining 70% of the data for training. Now in this blog we will continue to build the SVM model and arrive at the confusion matrix which will end this series on SVM.

  1. Let use the SVM model using the model<-svm(SalesRating ~ .,data=dataset)
  2. The above statement will be working correctly only if you have loaded e1701 library. We are specifying SalesRating because it’s the class which will be used by SVM for learning.

  3. Now you can see that the model has printed with SVM Type as C-Classification and Kernel used is Radial basis function with a cost 1 and Gamma of 0.5. The support vectors is 40
  4. The summary function will provide you the classification and Levels information also as given in the below picture

  5. Now we will plot the Model against the data so that we can understand the relationship on the sales vs temperature in the city as given below using the command plot(model,dataset):

  6. Now the next step is to really do the prediction, for the purpose predicting we need to classify the data Training Data Set and Test Data Set, which we have already done in our previous blog.
  7. Now using the SVM we will train the system using SVM using the commands given in the screen

  8. The plot output looks like this on the trainingset data:

  9. Now we will move on to prediction using the model which has been trained using the trainingset of data. We will use the prediction<-predict(model,testset[,-1]) for prediction with testset excluding the class column SalesRating.
  10. We would like generate the confusion matrix using the table statement with the following command tab<-table(pred=prediction,true=testset[,1])
  11. When you print the tab you will get the confusion matrix like the following:

  12. The confusion matrix looks like the below :

  13. Now we will find the accuracy of the our prediction the Formula for Accuracy is

  14. Applying the above formula the accuracy is 89.79%

In the future post we will try to explore the scenario of multi-class classification.

Additional References:

SVM Implementation step by step with R: Data Preparation

In this post, we will try to implement SVM with the e1071 package for a Ice-cream shop which has recorded the following attributes on sales:

  • The temperature in the city
  • Sales on a particular day
  • Labeling whether its “Good” or “Bad” sales.


  1. Lets install the necessary packages using the command


  2. It will ask for the CRAN mirror, choose the one nearest to your country. Subsequently you will see the message that binary packages has been installed in the specific path. Please note that I’m using RGui(32-bit) windows xp version.

3.  To start using this library you can issue the following command, I got a warning message that it was built for version 2.15.3, you can upgrade R to avoid this message:


4. I have the Ice-Cream parlor sales data in a excel workbook. You check with my earlier post on importing excel workbook with R for importing data or you can convert the excel to CSV format and read it using read.csv() method.

5. I will use the later one as given below:

6. Now we have the necessary data and you can see the columns read as “SalesRating”, “CityTemperature”, “IceCreamSales”.

7. I’m assigning the data from the CSV file to a dataset like the following


8. We will use the 70% of the data for Training Dataset and 30% for Testing Dataset. Ideally we are going to subset a larget dataset. The first step towards that is creating a index, like the one given below to determine the index from the 1st to the nth row of the dataset:


9. If you would like to see what exists in the index, just try to console it out. Next we are going to create testindex to sample out the 30% of the dataset using the following commands


10. Now we need to segregate the testdataset and trainingdataset using the testindex we have create given below
11. Now we will output testset and trainingset summary which will give you an idea of how it has got segregated:

So far we have seen the steps in preparing the data for analysis using SVM which has 44 TestSet records and 105 TrainingSet records, in the next post we will see the SVM Process.

SVM: Support Vector Machines and Multi Class Classification

Most of the Classification examples out in the internet talks about binary classification. Also we must understand the learning also applies widely on multi class classification. Good exampleswould be Risk of High/Medium/Low.

According to Wikipedia in machine learning multiclass or multinomial classification is the problem of classifying instances into more than two classes.


We would need to test the hypothesis of a patient being in a risk of Heart Attack.

Data Available:

Various set of attributes with pre-defined label of risk of High/Medium/Low as training set of 500 records.

Before we get into the solution we need to understand little bit of SVM.

History of SVM:

Invented by            : Vladimir N Vapnik

Current Standard Proposed by    : Vapnik and Corinna Cortes

Year                 : 1995

Before we even start to understand what is SVM, we need to understand what is Hyperplane. Hyper plane is a concept in geometry if we remember our good school days. We can recollect it well. It is an n-dimensional space. To understand more with examples on hyper planes look at this link.

Similar to any other machine learning techniques, SVMs take some data to start with that’s already classified (the training set), and tries to predict a set of unclassified data (the testing set).

Good URL to know more: (I like the most).

We will explore on SVM with a simple example on multi-class classification in my next post.