Data Exploration

“Data exploration is a process of probing more deeply into the dataset, while being careful to stay organized and avoid errors.”

It is always important to bring focus to the data before analysis begin, especially when the data is not collected in a controlled manner. Data exploration is the term used to refer the steps or process involved in search and analysis of large amount of data to analyze based on the data gathered. There are two methods in data exploration they are the following:

  1. Automatic
  2. Manual

Data exploration refers to the following typical tasks:

  1. Checking the data for similar patterns
  2. Looking for the data structure and relationships
  3. Obtaining straight forward graphical representation of data to understanding aspects of a & b

Why do we need to data exploration?

  1. Identify data related issues or errors or outliers?
  2. Patterns: Symmetric, Skewed, Bimodal, Clusters. Please click here to know more about data patterns.
  3. Relationship: Linear, Polynomial, exponential
  4. Identification of data model

Good References for data exploration with R:

http://www.stat.auckland.ac.nz/~kxio001/workshop.pdf

http://www.who.int/tb/advisory_bodies/impact_measurement_taskforce/meetings/ie_apr09_p_exporing_data_r.pdf

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s