Table of Contents

Recognizing Supermarket Purchase Patterns Using Fp Tree Case Study

Pages: 10 Words: 2469

Recognizing Supermarket Purchase Patterns Using Fp Tree Case Study

Need an Assignment Helper in the UK? Native Assignment Help is here to support you every step of the way. Our skilled experts specialize in a wide range of subjects and are committed to delivering high-quality assignments that meet the highest academic standards.

1.0 Introduction

The modern days of technological advancement have opened various opportunities to businesses to get all the list of product sales according to their number of product sold. This helps in performing various business integration and prioritization of the products. In Supermarkets, the companies often prefer to have a basic idea of the highest number of sold product and various transaction data for developing product and marketing strategy. Here comes the need for different data mining algorithms that allows making prioritizing the products based on its sales or transaction.

This study thus is aiming to discuss the different Growth algorithms that mostly used data mining of frequent items, and the study is focused on addressing these Frequent Pattern (FP) growth algorithms and making an FP Tree based on the examples on identifying Purchase pattern of customers in supermarkets. The study thus discusses the FP tree concept, various data mining algorithms, shows steps for developing an FP tree and discusses their advantages and limitations.

2.0 Rationale

An item of purchase that occurs frequently can be called a frequent item set. For example, In the Supermarket, customers often buy products that require other products also such as Bread and butter, Onion with Chicken, Laptop and antivirus, etc. This mining of frequent item set is known as data mining that helps to identify the relationship between different items that often sold together (Guidotti et al. 2019). These Frequent Pattern of Data mining techniques has various applications such as data analysis, cross-marketing, sale campaigns, software bugs, etc. In Supermarket transaction pattern analysis, the Association rules can be applied for examining or understanding customer behavior by purchased products. This rule helps to describe the frequency of purchase of a particular product. Therefore, this study will follow the association rule in performing data mining of Frequent Pattern.

3.0 Discussion

3.1 FP Growth Algorithm

Frequent pattern mining is basically a method for generating different item set and finding the "most frequent item set". By using the Apriori Algorithm, the size of the database items can be reduced but there some shortcomings of this algorithm (Guidotti et al. 2017). These are as follows:

Requires generating Candidate item set, and in the case of a larger database, the number of item sets can also be huge.
For checking each item set support, the Apriori algorithm performs multiple scans of this supermarket database, and this can cause high cost.

Due to having all of those disadvantages, Frequent Pattern Growth algorithms are being used. It is an improved version of the Apriori algorithms, but it does not require candidate generation. It represents the supermarket database in the form of a tree known as the FP tree.

But in order to understand the FP growth algorithms, Association Rules that uncover the relationship between different attributes also need to be understood (Baderiya and Chawan, 2018). The formula of Association rule shows below:

For Example, A supermarket observes that there are 400 customers on Saturday. Out of these 400 customers, 200 brought Meat, and among the 200 customers, 100 brought Onions.

Therefore, as per the association rule, In that particular Supermarket, Customers buy onion with meat on Saturday, with the support of % and confidence of>

3.2 FP Tree concept

The "Frequent pattern Tree" is like a binary tree structure made with various initial item sets presents in the database. The primary purpose of this FP Tree is to show the most frequent items or purchased pattern, and each item of the supermarket database/ item set is represented through a node in the FP tree (UGWU and Udanor, 2021).

In the FP tree, the lower nodes represent item sets and the root node always stays null. The most frequent item or node form relation with the root node, and the other items presents in the item sets are falls under the lower nodes according to their purchase frequency.

3.3 Algorithm development Steps

This part of the study shows the different steps for mining the frequent pattern of the different product purchased from the Supermarket. The steps are discussed as follows:

The very first step is Scanning the database in order to find the occurrences of item sets.
The second step of mining is to construct the Frequency pattern tree. In this case, the root node is the very first step of constructing the tree.
The next step is to find the different transactions and finding out the various item sets, and prioritizing the item sets. Those items having a maximum frequency of purchase are at the top and lower purchase count at the next node and so on. Therefore, in this step, it constructs the tree based on the transaction item sets by descending order.
In this step, item sets need to be ordered to their number of count by descending order. In case of any item is already present at another branch node, then the "transaction branch" share a prefix to the root node, which is expected (GADARI , 2019).
Count of product or item sets is being incremented. Both the new node and the common node count value are increased by 1.
The sixth steps are mining the created FP tree. Node with the lowest frequency examined first along with their links. It provides the "Frequency pattern length", and traversing of the path also performed (Thu, 2019). It is known as a conditional path.
By the item sets count, the Conditional FP tree is being constructed.
From the "Conditional FP tree", frequent patterns are being generated.

3.4 Building FP tree with Example

This part of the study is aiming to discuss the process of developing an FP tree with a small example of Supermarket product Purchased data or transaction data.

The transactions that are going to be discussed here in this project consists of 5 different items. The items are as follows:

Meat (A), Corn (C), Beans (B), Onion (O), Tomatoes (T), Soft-Drink (D)

The list of items and the following transactions have shown below, where the minimum support is 2.

Transaction ID	List of Items
T1	B, A, O, C
T2	A, T, O
T3	B, D, O
T4	T, C, B, O
T5	A, C, B
T6	O, C

Table 1: List of transaction and items

(Source: Self-created)

Transaction T1 consists of four items as Beans (B), Meat (A), Onion (O), Corn (C) and Soft-Drink (D). Similarly, the other transactions also consist of various items. First, let consider that the minimum support for these transactions is 2.

Then the transactional database consisting of the number of items per product is being shown in table 2. In this table, the Item Soft-Drink (D) has been omitted because the minimum support count is two here and the Soft-Drink (D) only has one support count.

Items	Support Count
Meat (A)	3
Corn (C)	4
Beans (B)	4
Onion (O)	5
Tomatoes (T)	2

Table 2: List of Items and their support counts

(Source: Self-created)

Now it is needed to restructure the table by the support count and needs to be structured by descending order. Here, Beans (B) and Corns (C) have the same support count, so anyone can be on top and Onion (O) has the most support number of counts. The below figure shows the shorted items list in descending order and represented in Table 3.

Items	Support Count
Onion (O)	5
Beans (B)	4
Corn (C)	4
Meat (A)	3
Tomatoes (T)	2

Table 3: Shorted table based on the Support count

(Source: Self-created)

Now, from Table 3, the construction of the frequent pattern tree can be formed. The FP tree root node is represented through Null, and the item having the highest Support count will connect to the root node.

Tree for Transaction 1 (T1) -

3.5 Advantages and limitations of FP Growth Algorithm

Advantages -

By following this algorithm, it requires scanning the database only two times compared to the "Apriori Algorithms" that require transaction for each interaction.
This algorithm does not require pairing, and hence, this becomes faster (Mashud, 2019).
In the case of mining data for short as well as long frequent pattern, this algorithm is scalable and efficient.
The database by these algorithms stores in a compact memory version.

Disadvantages-

This tree can be difficult and cumbersome in building compared to the Apriori.
The FP tree can also be expensive.
In the case of a larger database, this FP algorithm may not be able to fit in the Shared memory (Algassim, 2019).

3.6 Comparison with Apriori Algorithm

FP Tree	Apriori algorithm
The process of the FP tree algorithm is very much faster than the Apriori algorithm.	The Apriori algorithm’s execution process is much slower than the FP tree algorithm process.
In this algorithmic process, there is not much use of arrays; it is basically a tree-based algorithm.	This algorithm is based upon the array-based system function
Only two databases are needed to perform the whole process of study.	Multiple databases are needed in the Apriori algorithm.
Depth-first searching methods are needed to perform this algorithm (BHARATHI and DEVENDER, 2018).	In the Apriori algorithm, a certain point of the breadth-first search algorithm is needed.
In the Fp tree, algorithms Data are interdependent.	In Apriori, algorithms Data are not interdependent.
Fp tree algorithm requires less memory space.	Apriori algorithm requires very much large scale memory space.
The process of increasing run time depends on the exponential set of process.	The process of increment in runtime is a sequential way of the process.
The main producing function is being generated through step by step productive variants of system procedure.	In the apriori algorithm, the main processing parts are relatable under the array-based programmed modulation technology (Wang and Chang, 2019).
The observational policies are derived under some fastest formula based options.	Only its relatable performances are depending upon the current form of every Apriori algorithm process of selection.

4.0 Conclusion

This research study can be compromised in that way where the all-over system is being dependent on the FP tree providence system of methods. All the factors are related to the systematic view that has enchanted the methods based upon every profit-based system technology with normal upbringings terms. These are organized in a verified way that can be managed with an FP tree, algorithmic view based operations are relatable for every organizing system, functions are verified to be completed with a total tree-based analysis. Several parts are generated with an actual organizing process that can be selected upon its every systematic view given section based system apriori algorithmic process are developed in a total process of the research study.

Reference List

Journal

Abo Algassim, A.G.A., 2019. Using Frequent Pattern Growth Algorithm in Analyzing Sudanese Shopping Behavior (Doctoral dissertation, Sudan University of Science & Technology).

Baderiya, M.S.H. and Chawan, P.M., 2018. Customer buying Prediction Using Machine-Learning Techniques: A Survey.

BHARATHI, K. and DEVENDER, D.B., 2018. ANALYTICAL STUDY ON FREQUENT PATTERN MINING ALGORITHMS APRIORI, FP-GROWTH AND ECLAT.

GADARI SANTHOSH KUMAR, D.R., 2019. CUSTOMER PRODUCT PURCHASE PREDICTION ANALYSIS.

Guidotti, R., Rossetti, G., Pappalardo, L., Giannotti, F. and Pedreschi, D., 2017, November. Market basket prediction using user-centric temporal annotated recurring sequences. In 2017 IEEE International Conference on Data Mining (ICDM) (pp. 895-900). IEEE.

Mashud, M., 2019. Designing an Application for Analyzing Consumer Spending Patterns Using the Frequent Pattern Growth Algorithm. Jurnal Penelitian Pos dan Informatika, 9(2), pp.151-159.

Thu, H.N.N.M., 2019. Discovering Generalized Association Rule in Web Usage Mining by Frequent Pattern Tree (FP-TREE) (Doctoral dissertation, University of Computer Studies, Yangon).

UGWU, N.V. and Udanor, C.N., 2021. Achieving Effective Customer Relationship using Frequent Pattern-Growth Algorithm Association Rule Learning Technique. Nigerian Journal of Technology, 40(2), pp.329-339.

Wang, C.S. and Chang, J.Y., 2019. MISFP-growth: hadoop-based frequent pattern mining with multiple item support. Applied Sciences, 9(10), p.2075.