Data Exploration

“Data exploration is a process of probing more deeply into the dataset, while being careful to stay organized and avoid errors.”

It is always important to bring focus to the data before analysis begin, especially when the data is not collected in a controlled manner. Data exploration is the term used to refer the steps or process involved in search and analysis of large amount of data to analyze based on the data gathered. There are two methods in data exploration they are the following:

  1. Automatic
  2. Manual

Data exploration refers to the following typical tasks:

  1. Checking the data for similar patterns
  2. Looking for the data structure and relationships
  3. Obtaining straight forward graphical representation of data to understanding aspects of a & b

Why do we need to data exploration?

  1. Identify data related issues or errors or outliers?
  2. Patterns: Symmetric, Skewed, Bimodal, Clusters. Please click here to know more about data patterns.
  3. Relationship: Linear, Polynomial, exponential
  4. Identification of data model

Good References for data exploration with R:

http://www.stat.auckland.ac.nz/~kxio001/workshop.pdf

http://www.who.int/tb/advisory_bodies/impact_measurement_taskforce/meetings/ie_apr09_p_exporing_data_r.pdf