Step by Step Correlation Matrix using Rapid miner on the Fuel Consumption Data of cars in Canada

Correlation Matrix will help you understand the co-relation between various variables. It is a Symmetrical Matrix where ij element in the matrix is equal to the correlation co-efficient between the variable i and j. The diagonal element are always equivalent to 1. (Thanks

Purpose of using Correlation Matrix:

  • To identify the outliers
  • To identify the co-linearity exists between the variables.
  • Used for regression analysis

Simple understanding:

Correlation is a number between +1 and -1 that helps you to measure the relationship between two variables which are being linear(e.g., Higher the income, Higher the Tax) where correlation is +1 or positive, on the other hand (e.g., every item sold will reduce your inventory) where the correlation is -1 or Negative. If it’s near to zero it means that co-relation doesn’t exists (e.g., Average temperature in summer, Average sales of news magazines) which would reflect linear independence between variables. Also its very important to understand the correlation would not affect by the scale of the variables and how its measured.

About the Dataset:

Source: Thanks to Fuel Consumption Ratings from , Link:

The dataset is which I have used is a refined one of the data from the above link. The dataset I use in this post has the following attributes:

  • Make – Car Make
  • Class – Referred as given below





















  • Engine
  • Transmission
  • Fuel Type
  • City (Fuel Consumption during City drive in mi/gallons)
  • Hwy (Fuel Consumption during Highway drive in mi/gallons)

Tool Usage:

In this post we will use Rapid Miner tool to understand the Fuel Consumption of cars in Canada for the Year 2013 data related variables.

Steps to evaluate correlation Matrix:

Step 1: Open Rapid Miner which you can download from


Step 2: Import the data from the local drive. In my case I have kept it in excel format, for that you have to click “Import Excel Sheet…” under the Repository Tab. Also you can look at a repository named “SivaRepository” which I have created previously.

Step 3: After you import and click Finish you will something like this as given below, also you can see the log to identify if there are any errors, I have given the name for the dataset as CanadaCarsFuelConsumption2013.

Step 4: Now we will do the correlation matrix from this data. Select the correlation matrix operator from the Operators under Modeling/Correlation and Dependency Computation Section.

Step 5: Now also drag the CanadaCarsFuelConsumption2013 dataset to the process area and connect the out to the exa of the CorrelationMatrix Operator. Then connect the mat output to the process res for the output.

Step 6: Now let’s run the process to the see the results.


Based on this outcome we can realize that City, Hwy and Fuel related variables have a close correlation that other parameters as the relationship is very positive when compared to other variables. We can also look at the pair wise tables to have better understanding.


2 thoughts on “Step by Step Correlation Matrix using Rapid miner on the Fuel Consumption Data of cars in Canada

    • Hi It looks like the strong correlations are between class and hwy and City the correlation is negative so it could means that depending of the class, a better class (the type of car) you will comsume less fuel. Regards

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s