Data Profiling is nothing but analyzing the existing data available in a data source and identifying the meta data on the same. This post is an high level introduction to data profiling and just provide pointers to data profiling.
What is the use of doing data profiling?
- To understand the metadata characteristics of the data under purview.
- To have an enterprise view of the data for the purpose of Master Data Management and Data Governance
- Helps in identifying the right candidates for Source-Target mapping.
- Ensure data fits for the intended purpose
- It helps to identify the Data issues and quantify them.
Typical project types its being put to use:
- Data warehousing/Business Intelligence Projects
- Research Engagements
- Data research projects
- Data Conversion/Migration Projects
- Source System Quality initiatives.
Some of the open source tools which can be used for Data Profiling:
Some links which points to understand various commercial players exists and there comparison and evaluation:
- Gartner Quadrant of Data Quality tools : http://www.citia.co.uk/content/files/50_161-377.pdf
- An Evaluation Framework For Data Quality Tools
In the next post we will evaluate certain aspects of data profiling with any of the tools mentioned in this blog post.