In continuation to my posts on the Market basket analysis, I would continue my next steps towards the analytics using the data available in the FoodMart Dataset which you can download from this url https://sites.google.com/a/dlpage.phi-integration.com/pentaho/mondrian/mysql-foodmart-database. Before moving on to next steps its important that we understand certain things with respect to connecting to mysql from Sqoop as we are focusing on big data as retail is always big. Here are the steps..
- Please download the JDBC driver for Sqoop to interact with MySQL using the following URL : http://dev.mysql.com/downloads/connector/j/
- Make sure you downloaded the mysql-connector-java-5.1.25.tar.gz either using wget or you can download it from your windows machine if your connected with VirtualBox or VMWare.
- Then extract the files to get the mysql-connector-java-5.1.25-bin.jar file and place under sqoop/lib folder
- Make sure you have the necessary mysql server information like hostname, username and password with necessary access.
- Once you have got that make you have provided necessary privileges for other host to access the mysql server using the following statement:
grant all privileges on *.* to ‘username’@’%’ identified by ‘userpassword’;
- Then you can get the list of tables from the mysql database foodmart using the following command:
sqoop list-tables –connect jdbc:mysql://192.168.1.32:3306/foodmart -username root
Note: I have done this experiment with Sqoop version 1.4.3, Ubuntu 12.0.4 LTS on Virtualbox and mysql 5.5.24 with WAMP.
Caution: In my example I have used root as the username please don’t use the root username.
Other Links for your references:
So you go to a shop you see that a specific brand of Deodorant and Bathing bar are bundled as a product and have been displayed with a specific discount and you hand pick it with immense happiness (??) and satisfaction of a good deal. How does the shop keepers come to know about this? Intuition, Analytics, Case Based Reasoning, Pattern matching, etc.,
It could be Walmart, Target, Macys, TESCO or even a small self-owned retail outlet its important that they understand the customer/consumer behavior correctly to make good profit end of the day. Lets not think that its particularly useful in retail industry its very much important for Services based organization also to understand the consumer behavior.
For the sake of ease of understanding and moving towards practical aspects of such implementation we will try and understand some of the factors which would or could influence recommendation.
- Demography (City, Locality, Country,etc.,) (Transactions)
- Product mix based past sales history(Transactions)
- Social recommendations (Twitter, Facebook, posts) (Social Analytic/NoSQL/Semi Structured)
- Product Reviews(Blog/Review/Semi Structured Data)
- Post-Sales experience (Transactions)
The challenge would be to related these data and to make good recommendation through the system in a very short span of time to influence customer buying decisions. In my next post we will try to evaluate some of the data sets available in the internet for the further experiments on the same.
My aim would be to understand and implement a recommendation system or at least arrive at the right steps for making an recommendation system which would be reliable and can handle the complexity involved in data.
Keep waiting for next post.
Market Basket Analysis (Association Analysis) is a mathematical modeling technique based upon the theory that if you buy a certain group of items, you are likely to buy another group of items. It is used to analyze the consumer purchasing behavior and helps in increasing the sales and maintain inventory by focusing on the point of sale transaction(POS) data. Apriori algorithm is used to achieve this.
This algorithm is used to identify the pattern of data. It’s basically based on observation of data pattern around a transaction.
If a person goes to a gift shop and purchase a Birthday Card and a gift, it’s likely that he might purchase a Cake, Candles or Candy. So these combinations help predict the possible combination of purchase to the retail shop owner to club or package it as offers to make better margins. This also enables to understand consumer behavior.
When we look at apriori algorithm its essential to understand what is Association rules too. That will help to understand in the right perspective.
Association rule learning is a popular machine learning technique in data mining. It helps to understand relationship between variables in large databases. It’s being primarily implemented in Point of Sale in retail where large transactions are recorded.
Reference links for Begineers:
I like this http://nikhilvithlani.blogspot.in/2012/03/apriori-algorithm-for-data-mining-made.html url very simple and easy to understand for novice or beginners.
Reference links for Researchers and algorithm lovers:
My objective of this post is a pre-cursor to use R and Big Data to use Market Basket analysis to do recommendation in retail point of sale domain or based on billions of e-Commerce transactions. In the upcoming posts we will see how we leverage this algorithm and do appropriate analysis on a point of sale data. Keep watching this space.