Plot – My exploration in data analytics

In this previous post we saw a quick introduction to what is linear regression. In this post we will see how we can implement linear regression using R. We are planning to use the data used in our firstpost that is studentsdata. Would like to know more about how to load the data, please refer to my other blog post.

Step 1: Let’s have a look at data for various studentsdata using the command studentsdata which is loaded as per my another blog post.

Step 2: We have stored the data in the table studentsdata now we will plot the marks of tamil and the total scores and see how they come along using the command plot(studentsdata$Tamil,studentsdata$TotalScores)

Step 3: It’s practically difficult to fit a perfect straight line in this case. So we will calculate and plot the line of Best fit or the Lease squares regression line. We will be using the lm Command to compute the linear model.

> res=lm(studentsdata$TotalScores~studentsdata$Tamil)

> abline(res)

> res

Call:

lm(formula = studentsdata$TotalScores ~ studentsdata$Tamil)

Coefficients:

(Intercept) studentsdata$Tamil

233.477 1.311

Step 4: You can see the plot done by abline here for the line of best fit.

Step 5: Prediction of total score using linear regression Now we have the line of best fit

TotalScore=studentdata$Tamil . 1.311 + 233.477

If you wish to predict the totalscore of a student who would be scoring 75 in tamil it would be as followes:

> 1.311*75+233.477

[1] 331.802

He/she would 331.80 total marks

In this blog we are going to see what linear regression is and in my next blog we will how to implement the same in R.

What is linear regression?

As per very simple definition from internet it goes like this:

A technique in which a straight line is fitted to a set of data points to measure the effect of a single independent variable. The slope of the line is the measured impact of that variable. It is one of the most widely used statistical techniques. It’s the study of linear relationship (straight-line) between variables under an assumption of normally distributed errors.

Why?

To determine the effect of one variable on the other. Technically, linear regression estimates how much Y changes when X changes one unit.

Examples?

Change in the fuel prices increases/decreases the inflation
Change in the raw material cost increases / decreases the product price
Change in class size increases/decreases the participation of the students in events

How do I do it manually?

Want to checkout manually how it’s being done please refer to this link, which provides a clear explanation of how linear regression works.

In the next blog we will see how we can use R for linear regression using the Population in India from 1901 and 2011.

Tag: Plot

Linear Regression using R: Part II

Linear Regression using R: Part I