# India voter turnout for 2014: Using simple Excel Linear regression model

Wondering what would be the voter turnout for the year 2014 based the past history of elections in India. Here is an attempt in my post to evaluate the Linear regression model and fitting a line to data to determine what could be the possible voter turnout percentage for the year 2014.

Source of Data:

International Institute for Democracy and Electoral Assistance has the data for the voter turnout in the elections in the past in India. http://www.idea.int/vt/countryview.cfm?CountryCode=IN. We will use that as the source for this analysis. Though the elections are not held linearly this is only an attempt to predict what will the line of fit within the given data.

Step 1: Let’s put the data in a spreadsheet. For the purpose of simplicity and to bring some linearity we have added Index column along with the data.

 Year Index Voter turnout Percentage 1952 0 61.17% 1957 1 62.23% 1962 2 55.42% 1967 3 61.04% 1971 4 55.25% 1977 5 60.49% 1980 6 56.92% 1984 7 63.56% 1989 8 61.98% 1991 9 56.73% 1996 10 57.94% 1998 11 61.97% 1999 12 59.99% 2004 13 58.07% 2009 14 58.17%

Step 2: Now let’s plot a scatter plot against this data as shown below. Make sure you select both the columns Index and Voter turnout percentage and select “Scatter with Only Markers” type (I’m using Office 2007). This highest voter turnout seems to be in the year 1984 with 63.56%

Step 3: Now let’s bring the linear regression equation to this chart, which will probably help us to evaluate what possibly could be the turnout in 2014? After selecting the chart select the “Design” Tab and look for “layout 9” which has fx along with a trend line and select that. It’s given in the picture below:

Step 4: After selecting the Layout 9, you will have the line of fit and its relevant equation. To have better clarity this equation has been moved on to the right.

Step 5: So putting things into the equation we are expecting a 59% overall turnout for 2014. Not sure, I’m also awaiting for the results. See the screenshots with the updated data in the spreadsheet.

 Year to Predict Equivalent Index Equation Resolving 2014 15 y = -0.0005x + 0.5974, R² = 0.0069 59%

Lets exercise our democratic rights and await for good governance. Also lets check the voter turnout for the year 2014. 🙂

# Linear Regression using R: Part II

In this previous post we saw a quick introduction to what is linear regression. In this post we will see how we can implement linear regression using R. We are planning to use the data used in our firstpost that is studentsdata. Would like to know more about how to load the data, please refer to my other blog post.

Step 1: Let’s have a look at data for various studentsdata using the command studentsdata which is loaded as per my another blog post.

Step 2: We have stored the data in the table studentsdata now we will plot the marks of tamil and the total scores and see how they come along using the command plot(studentsdata\$Tamil,studentsdata\$TotalScores)

Step 3: It’s practically difficult to fit a perfect straight line in this case. So we will calculate and plot the line of Best fit or the Lease squares regression line. We will be using the lm Command to compute the linear model.

> res=lm(studentsdata\$TotalScores~studentsdata\$Tamil)

> abline(res)

> res

Call:

lm(formula = studentsdata\$TotalScores ~ studentsdata\$Tamil)

Coefficients:

(Intercept) studentsdata\$Tamil

233.477 1.311

Step 4: You can see the plot done by abline here for the line of best fit.

Step 5: Prediction of total score using linear regression Now we have the line of best fit

TotalScore=studentdata\$Tamil . 1.311 + 233.477

If you wish to predict the totalscore of a student who would be scoring 75 in tamil it would be as followes:

> 1.311*75+233.477

[1] 331.802

He/she would 331.80 total marks

# Linear Regression using R: Part I

In this blog we are going to see what linear regression is and in my next blog we will how to implement the same in R.

What is linear regression?

As per very simple definition from internet it goes like this:

A technique in which a straight line is fitted to a set of data points to measure the effect of a single independent variable. The slope of the line is the measured impact of that variable. It is one of the most widely used statistical techniques. It’s the study of linear relationship (straight-line) between variables under an assumption of normally distributed errors.

Why?

To determine the effect of one variable on the other. Technically, linear regression estimates how much Y changes when X changes one unit.

Examples?

1. Change in the fuel prices increases/decreases the inflation
2. Change in the raw material cost increases / decreases the product price
3. Change in class size increases/decreases the participation of the students in events

How do I do it manually?

Want to checkout manually how it’s being done please refer to this link, which provides a clear explanation of how linear regression works.

In the next blog we will see how we can use R for linear regression using the Population in India from 1901 and 2011.