It’s always useful to understand how the data is correlated with each other, when we have a dataset. In this blog we will take the example of the studentsdata set which we have discussed in my previous blog to understand how well scores in individual subjects is correlating with the total scores.
What is correlation?
In simple terms, how two dataset establishes relationship with the each other can be termed as correlation. Example you would like to correlate whether you salary has increase over your age, how much of impact your expenditure has increased over your increase in salary, How much of sales has increased in the sale of umbrella based on the level or periodicity of the rainfall, etc,
Before we begin, If you want to know how to import data from excel to R environment please read my blog.
We will have a look at the data which we have now. So the studentsdata is having 8 columns which are imported from an excelsheet.
We can use the cor(var1,var2) method to determine the correlation, which will default return the pearsons correlation co-efficient. Now will initially find the correlation between the scores of tamil subject and TotalScores. If you see the below picture we have used the function cor(studentsdata$Tamil, studentsdata$TotalScores) which is returning the value of 0.4370992 which is 43.70% which seems to low positive correlation. We have also tried to plot the data between both the variables using plot. If you wanna learn how to do calculation for correlation please refer to this link for a simple example.
Interestingly in this dataset when we refer to the English subject and TotalScores the correlation coefficient value is 0.7475341 which is 74.75%. This seems to establish a strong relationship between the totalscores they have secured in relation with the subject English. If one variable increases when the second one increases, then there is a positive correlation. In this case the correlation coefficient will be closer to 1. In this instance if the score of English marks increase and TotalScores will increase significantly as they very positively correlated.
The same scatter plot has been plotted with ggplot2 library using qplot method. Please find the picture below:
How the correlation fairs with other subjects have a look, R Made it so simple isn’t?