I’ve been experimenting Power BI recently for a proof of concept development on building a simple dashboard to display the consumption of electricity across different equipments. This dashboard was being built on a datawarehouse DB. Following are some of the observations I found in my initial experimentation with it. Neverthless there could be lot of promise in the latest releases. I leave it to the viewers to check Power BI in their own discretion.
- Unable to provide custom label the axis
- Drill down capabilities are limited
- Close integration with Power View only
- Limited documentation on the deployment scenarios
- Inability to export to multiple formats
- Facility to develop color or formatting based on expressions (As of 30th April 2015 they have started working on it)
- Limitations on data sources exists
These are all some of the limitations based on my exploration.
Features of Google Fusion tables.
- Viewers can see the table or charts from any where once shared
- API capability to program the table
- Ability to integerate tables from the internet
- Keep the data online and make it single version of truth across the board
In this blog post we will create a table for sharing the home work given in a school using Google Fusion Tables. This tables also helps to keep track of the various home work given in the school with the due date. This table can be shared across of parents or students of a specific class so that everybody can be the same page. This would also help the students who were absent on that particular day.
Step 1: Getting Started
You can navigate to the create a fusion tables using the following URL:
Step 2: Creating Fusion Table
Step 3: Change Columns: To add or modify the columns
You can add new columns by clicking the new button. It has the following data types : Text, Number, Date and Location
Also one of the key feature is that we can add a list of Drop-Down items using a list of comma separated values or using Key-Value pair based on JSON.
Step 4: Adding multiple values for dropdown
In this example I have added Activity, Subject, Research as 3 category items for the column Category.
Step 5: Adding rows
Click the Add row button from the Edit Menu to add a new row to the table.
Now you will be presented with a form to capture the inputs which will collect the details for the table.
This makes the data entry process more easy to the table also it helps you to select value from the drop down control and also date from a Date time picker. You can either click save or Save and add another to add another row.
You can download Power BI from the URL http://www.powerbi.com/dashboards/downloads/ and you can follow the simple steps as given below:
Step 1: Start Screen
Steps 2: Agree to software license terms
Step 3: Installation path
Step 4: Create a desktop shortcut
Evaluation of the tool:
Step 1: Get Data ( I have downloaded the data from Event viewer in the CSV format and used the data source in this scenario)
Step 2: Select the fields from the right hand side Fields
Step 3: I was putting across a Table and a Pie Chart.
Some problems I faced during data load with PowerBI, takes time to load the data from excel sheet. Otherwise still I believe this is a promising tool. I just tried to export my event viewer tool to and just played around to get the following output.
The drill down capability is filtering the data table according to the chart events is very nice. But still looking to play around with it to understand more.
Assuming you’re a budding manager and you would like to impress your boss with stunning visualization and present the data in a infographic format so that he feels good about the you did. I came across this post which helps you to quickly understand various infographic tools with good example and reasoning too.
Periodic table of visualization also exists which helps to identify which are the ones to be used for Data, Strategy, Information, Metaphor, Concept and compound visualization. These are very useful one to organize the data visualization according to the content. May be it will take your memories to good old days wherein you have seen the periodic table in chemistry for compounds. Credit goes to Ralph Lengler & Martin J. Eppler, Institute of Corporate Communication, University of Lugano, Switzerland for this wonderful research effort which also details about methodologies.
When it comes to number crunching it is essential that you represent data with good charts. But how do you go about choosing the right charts ? Check on this infographics form labnol.org which would help you to choose the right set of chart for data analysis. This gives various chart options to decide based on Time, Frequency, Variables,etc., also seggregated by Composition, Relationship, Comparison and Distribution.
Please provide more comments and feedback.
Writing this post after quite a long time on verifying the quality of data in a column using Data profiling techniques. Let’s take a simple example of a table called Country which has Code and Description.
US United States of America
UK United Kingdom
Possible problems in data:
What could be the possible problems in this with respect to data:
- The application might have been designed in such a way that Code must be 3 characters in length where in during data migration there could be some code which might of 2 characters in length
- Code might have been destined to be All caps which might have been compromised
- There should not be any null values in the Description
Now let us see how we can identify these problems in the table Country with the example.
Step 1: Connect to the oracle database using the DQ Repository
Step 2: Now we will add simple Column analysis on the column Code for the table Country
Step 3: Select the Indicators for each column to analyze, that is essential for analysis. Without indicators we will not be analyze the issues. For this example I have chosen some of the parameters for analysis as given below.
Step 4: Run the Analysis
When you run the analysis you can get to the graphical chart as depicted in the snapshot provided below:
In the above picture we can realize the row count is 10 and it has one duplicate values and there are 9 distinct values.
In this picture you can also find the length related metrics and Text statistics which even takes care of the case related issues.
This way we can easily identify issues in a specific column. Hope this post gives a simple example which might be useful in different context for ensuring data quality.
In continuation to my previous post, in this post we will look at some sample data and use Talend Open Studio for Data Quality for Data profiling. You can also refer to this link as an alternative tutorial. In this blog post we will evaluate how to do Connection analysis which will help us to the following key parameters in a database:
- How many tables exists?
- How many rows exists?
- How many views exists?
- How many rows exists in each table etc.,?
Why do we need to do Connection Analysis:
The connection analysis helps you to get a overview of the Database connection in the context quickly.
Creating a DB connection
Step 2: Installing mysql drivers
Step 3: Checking the connectivity
Starting Connection Analysis:
Step 1: Create the Database Structure Overview as given in the steps.
Step 2: Select the DB Connections
Step 3: Select the tables you need to analyze on.
Step 4: Now run the ClassicModel Connection Analysis
Step 5: Execution status after run
Step 6: Now it results in data analysis with statistical information such as 3864 rows and 8 tables.
Data Profiling is nothing but analyzing the existing data available in a data source and identifying the meta data on the same. This post is an high level introduction to data profiling and just provide pointers to data profiling.
What is the use of doing data profiling?
- To understand the metadata characteristics of the data under purview.
- To have an enterprise view of the data for the purpose of Master Data Management and Data Governance
- Helps in identifying the right candidates for Source-Target mapping.
- Ensure data fits for the intended purpose
- It helps to identify the Data issues and quantify them.
Typical project types its being put to use:
- Data warehousing/Business Intelligence Projects
- Research Engagements
- Data research projects
- Data Conversion/Migration Projects
- Source System Quality initiatives.
Some of the open source tools which can be used for Data Profiling:
Some links which points to understand various commercial players exists and there comparison and evaluation:
In the next post we will evaluate certain aspects of data profiling with any of the tools mentioned in this blog post.