Market Basket Analysis Retail Foodmart Example: Step by step using R

This post will be a small step by step implementation of Market Basket Analysis using Apriori Algorithm using R for better understanding of the implementation with R using a small dataset. This will also help to give detailed understanding of how simply we can use R for such purposes.

I’ve made the data from the foodmart dataset into this transaction set using the combination of the Time_Id and Customer_Id composite key. This will be a unique transaction Id which has been created as Trans_ID and incorporated Product name for easier understanding with the table name POS_Transcations. I have exported the data from this table as RetailFoodMartData.csv. This has 86829 records. For the sake of simplicity and quick understanding I have copied few data in transactions limited the number of rows to 105 and reexecuted the whole with RetailFoodMartDataTest.csv. The final result shown will be the output of RetailFoodMartDataTest.csv.

Before we move on to convert them into transaction to put to use in the Apriori algorithm we need to make sure there is no duplicates exists in the vector or data.frame. Otherwise you will get a error like “cannot coerce list with transactions with duplicated items”. So please remove the data from the CSV source file using Data->Remove Duplicates before you import data to R

Hope this would suffice for this exercise.

I’m using R version 3.0.1 for the analysis.

Data Preparation:

Step 1: Import Excel to the R environment. If you would like to know how to import, please refer to my blog post here.

Step 2: Please find the outcomes of the import steps and summary using R and you can find the top 5 records using head.

Step 3: In the above screenshot you can realize that first 6 items are belonging to the same transaction set, now our objective is to group or aggregate the items together based on the transaction id. We can do that using AggPosData<=split(RetailPosData$ProductName,RetailPosData$Trans_Id). This will aggregate the transactions with product name. In the example shown below it for Transaction ID 396 it shows 3 Products

Implementation of Association Rules Algorithm:

“Mining frequent item sets and association rules is a popular and well researched method for discovering interesting relations between variables in large databases.” As the next step we would need to load the arules library to the RConsole.

Step 4: When you try to invoke the arules library packages if doesn’t exist it will show a error as shown in the picture, now you can install the package and load the library.

Step 5: Now we need to use the aggregate which has been done using split method.
We need to coerce the transaction for the purpose of Apriori algorithm to process the data we will do it as per the following: Txns<-as(AggPosData,”transactions”). This is being done with data which is aggregated in Step 3.

Now we will quickly review the Apriori Algorithm implementation in R with the picture which shows its process in a simplified manner:


Courtesy: http://webdocs.cs.ualberta.ca/~zaiane/courses/cmput499/slides/Lect10/img054.jpg

Result of Summary (Txns)


In this example the summary provides the summary of the transactions as itemMatrix, this will be the input to the Apriori algorithm. In this example Atomic Bubble Gum with 6 occurrences.

Step 6: Now we will run the algorithm using the following statement:

Rules<-apriori(Txns,parameter=list(supp=0.05,conf=0.4,target=”rules”,minlen=2))

In the above obtained results it gives an understanding that if a customer buys Just Right Canned Yams there is 100% possibility that he might by Atomic Bubble Gum, similarly if a customer purchase CDR Hot Chocolate there is a possibility for him to buy either "Just Right Large Canned Shrimp" or "Atomic Bubble Gum". Confidence refers to the likelihood of the purchase and Support refers to the % of involvement in the transactions.

Step 7: Now we will decrease the confidence level to 0.2 and see the results given below, now the rules generated has increased, You can inspect the rules using Inspect(Rules) and you can specifically look at the rules using Inspect(Rules[1]):

Step 8: Now we will visualize the Top 10 items by frequency by using the following statement, itemFrequencyPlot(Txns, topN = 5)

 

Good references are available which have more steps in detail:

http://snowplowanalytics.com/analytics/catalog-analytics/market-basket-analysis-identifying-products-that-sell-well-together.html

http://prdeepakbabu.wordpress.com/2010/11/13/market-basket-analysisassociation-rule-mining-using-r-package-arules/

http://www.eecs.qmul.ac.uk/~christof/html/courses/ml4dm/week10-association-4pp.pdf

Though this is considered to be “Poor man recommendation engine” it’s a very useful one. In my next post we will continue to analyze how we can do this kind of analysis on large volume of data.

Advertisements

Recommendation in Retail

So you go to a shop you see that a specific brand of Deodorant and Bathing bar are bundled as a product and have been displayed with a specific discount and you hand pick it with immense happiness (??) and satisfaction of a good deal. How does the shop keepers come to know about this? Intuition, Analytics, Case Based Reasoning, Pattern matching, etc.,

It could be Walmart, Target, Macys, TESCO or even a small self-owned retail outlet its important that they understand the customer/consumer behavior correctly to make good profit end of the day. Lets not think that its particularly useful in retail industry its very much important for Services based organization also to understand the consumer behavior.

For the sake of ease of understanding and moving towards practical aspects of such implementation we will try and understand some of the factors which would or could influence recommendation.

  • Demography (City, Locality, Country,etc.,) (Transactions)
  • Culture(Transactions)
  • Product mix based past sales history(Transactions)
  • Social recommendations (Twitter, Facebook, posts) (Social Analytic/NoSQL/Semi Structured)
  • Product Reviews(Blog/Review/Semi Structured Data)
  • Post-Sales experience (Transactions)

The challenge would be to related these data and to make good recommendation through the system in a very short span of time to influence customer buying decisions. In my next post we will try to evaluate some of the data sets available in the internet for the further experiments on the same.

My aim would be to understand and implement a recommendation system or at least arrive at the right steps for making an recommendation system which would be reliable and can handle the complexity involved in data.

Keep waiting for next post.

Introduction to Market Basket Analysis

Market Basket Analysis (Association Analysis) is a mathematical modeling technique based upon the theory that if you buy a certain group of items, you are likely to buy another group of items.  It is used to analyze the consumer purchasing behavior and helps in increasing the sales and maintain inventory by focusing on the point of sale transaction(POS) data. Apriori algorithm is used to achieve this.

Apriori Algorithm

This algorithm is used to identify the pattern of data. It’s basically based on observation of data pattern around a transaction.

Example:

If a person goes to a gift shop and purchase a Birthday Card and a gift, it’s likely that he might purchase a Cake, Candles or Candy.  So these combinations help predict the possible combination of purchase to the retail shop owner to club or package it as offers to make better margins. This also enables to understand consumer behavior.

When we look at apriori algorithm its essential to understand what is Association rules too. That will help to understand in the right perspective.

Association rule learning is a popular machine learning technique in data mining. It helps to understand relationship between variables in large databases. It’s being primarily implemented in Point of Sale in retail where large transactions are recorded.

Reference links for Begineers:

http://en.wikipedia.org/wiki/Apriori_algorithm

http://en.wikipedia.org/wiki/Association_rule_learning

http://www.nytimes.com/2012/02/19/magazine/shopping-habits.html?pagewanted=all&_moc.semityn.www&_r=0

http://cran.r-project.org/web/packages/arules/vignettes/arules.pdf

I like this http://nikhilvithlani.blogspot.in/2012/03/apriori-algorithm-for-data-mining-made.html url very simple and easy to understand for novice or beginners.

Reference links for Researchers and algorithm lovers:

http://learninglover.com/blog/?p=245

http://www.cs.umd.edu/~samir/498/10Algorithms-08.pdf

http://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap6_basic_association_analysis.pdf

My objective of this post is a pre-cursor to use R and Big Data to use Market Basket analysis to do recommendation in retail point of sale domain or based on billions of e-Commerce transactions. In the upcoming posts we will see how we leverage this algorithm and do appropriate analysis on a point of sale data. Keep watching this space.