Why move data from RDBMS or existing data warehouse to HDFS?

In continuation to my previous post, many would have questions like the following?

  1. When there are traditional ways doing Market basket analysis with Cubes with Enterprise Data Warehouse (EDW) why do we need to adopt this route of moving data to HDFS?
  2. What kind of benefits will someone get by moving this RDBMS data to HDFS?
  3. Will it provide any cost savings?

This post is a mere analysis of possible use cases which could complement your data warehousing and analytics strategy. With the big hype around big data and related technologies its important to understand what to use and when to use accordingly. Good reasons for using Hadoop to complement the datawarehouse.

  1. Usage of tools like Sqoop and moving the data to an HDFS infrastructure environment will provide you the following benefits:
    1. Storage of extremely high volume data with the help of Hadoop infrastructure
    2. Accelerating the data movements with nightly batches with the help of MR Tasks.
    3. Automatic and redundant backup with the help of HDFS’s natural fault –tolerant data nodes.
    4. Low cost commodity hardware for scalability.
  2. Movement of the structured data to the HDFS enable to analyze the data in relationship with the unstructured data or semi structured data such as tweets, blogs, etc.,
  3. Not necessary to model the data as it can be handled while it’s being read.
  4. This also provides you capabilities to do quick exploratory analytics before moving to the warehouse for final analysis.

You can look at the following papers for more information and detailed understanding of the same:

http://www.cloudera.com/content/dam/cloudera/Resources/PDF/Using_Cloudera_to_Improve_Data_Processing_Whitepaper.pdf

http://www.b-eye-network.com/blogs/eckerson/archives/2013/02/what_is_unique.php

http://www.teradata.com/white-papers/Hadoop-and-the-Data-Warehouse-When-to-Use-Which/?type=WP