Ensemble Learning for Credit Card Fraud Detection in Malls

Table of Contents

Credit Card Fraud Detection In Shopping Malls Using Ensemble Learning
Chapter 1
1.1 Introduction - Credit Card Fraud Detection In Shopping Malls Using Ensemble Learning
1.2 Background Study
1.3 Problem statement
1.4 Research aim and objective
1.6 Scope of the research
1.7 Dissertation structure
1.8 Summary
Chapter 2: Critical Review of Literature
2.1 Introduction
2.2 Previous Study of Literature
2.3 Literature gap:
2.4 Summary:
Chapter 3: Research Methodology
3.1 Introduction
3.2 Research strategy
3.3 Research approach
3.4 Research design
3.5 Data collection techniques
3.6 Data analysis plan
3.7 Ethical consideration
3.8 Summary
Chapter 4: Analysis of Findings, Evaluation and Outcomes
4.1 Findings
4.2 Analysis
4.3 Evaluation
4.4 Outcome
4.5 Limitations
4.6 Summary
Chapter 5: Conclusion and Recommendations
5.1. Linking with objectives
5.2. Future prospect of the study
5.3. Recommendation
5.4. Conclusion

Pages: 65 Words: 16207

Credit Card Fraud Detection In Shopping Malls Using Ensemble Learning

Get free samples written by our Top-Notch subject experts for taking the Assignment Help from native Assignment Help.

Chapter 1

1.1 Introduction - Credit Card Fraud Detection In Shopping Malls Using Ensemble Learning

Over the last few decades, the use of credit card while shopping in the mall has drastically increased. With the emerging technological advancement, the risk of credit card fraud has also increased over time. Fraudulent transactions associated with credit cards have been increased due to their low-risk nature. The incidence of credit card fraud in shopping malls is limited to less than 0.2% of all transactions all over the world but the impact of such fraud in the finance sector is huge and it can cause a huge loss as transactions can be of large amount. The losses due to such fraud in the shopping malls cause more than a billion dollars all over the world. So it becomes necessary to prevent such fraud and take some security measures to stop this fraud as due to such fraud the shopping mall bears a huge loss every year. There are ways of preventing such fraud such as using OTP on the mobile phone of the user during the transaction, securing the payment gateway while the transaction and creating security questions for online banking. The use of machine learning can be taken into consideration for detecting fraud. The use of ensemble learning can be useful in detecting credit card fraud in shopping malls.

1.2 Background Study

Credit card fraud in shopping malls in recent times has emerged as a global problem. Due to such fraud, big companies suffer huge financial losses every year. The incidence of credit card fraud in shopping malls is limited to less of all transactions all over the world but the impact of such fraud in the finance sector is huge and it can cause a huge loss as transactions can be of large amounts (Krishna Rao et al., 2021.). The losses due to such fraud in the shopping malls cause more than a billion dollars all over the world. Due to such frauds in shopping malls, financial losses emerge as a global problem nowadays. The importance of detecting such fraud has become mandatory so that the overall financial loss can be minimized globally. The use of various methods can be taken into consideration for preventing credit card fraud in shopping malls (Prusti and Rath., 2019). The use of a one-time password on the mobile phone of the user can be beneficial so that no one can use the credit card details of the user while transactions. There are so many other ways of preventing credit card fraud such as securing the payment gateway, creating security questions during the transactions, and so on. Though these methods are not fully proven and they are inconvenient for some users. So it is desirable to have a balance between convenience and security. The use of machine learning techniques become useful in detecting fraud associated with credit cards in shopping malls. The whole process of detecting credit card fraud is quite difficult and sometimes it becomes confusing to determine whether a fraudulent transaction attempt has passed the mechanisms associated with the prevention of such fraud. The main task of the fraud detection systems is to identify every transaction using credit cards in shopping malls (Safa and Ganga., 2019). The fraud detection system filters every transaction and identifies the fraudulent ones as soon as possible. The increment in fraud activities related to credit cards leads financial institutions to look for various ways in detecting fraud. Despite having the prevention mechanism the companies suffer a huge loss from fraudulent transactions as fraudsters have changed their deceiving strategies constantly so that they cannot be detected. Due to this factor, the traditional, rule-based fraud detection systems have become obsolete nowadays. The importance of the implementation of machine learning techniques in credit card fraud detection has increased drastically all over the world. In this research work, the implementation of ensemble learning in detecting credit card fraud in shopping malls will be discussed. There are several problems associated with credit card fraud detection such as unavailability of the datasets, dynamic fraudulent behaviour, skewed dataset, and accurate evaluation parameters (Yourself, Alaghband, and Garibay. 2019). The unavailability of suitable datasets creates several problems in collecting information reloaded to credit cards. The dataset associated with the transaction of the customer contains vital information about the customer. Due to this factor, the companies are not able to release such data in the public domain. This unavailability of the dataset creates several challenges in detecting credit card fraud. The behavioural changes of the scammers create challenges in detecting credit card fraud. The scammers change their behaviour to beat the detection system by changing their patterns. Highly skewed datasets associated with the transaction details create several challenges in the fraud detection system. The affectivity of any classifier depends on its accuracy but in the case of credit card fraud detection, accuracy is not considered the correct measure since the dataset of the transaction is of skewed nature. The skewed dataset in a model with high accuracy can sometimes be misclassified due to this factor. Therefore, the evaluation of such models becomes mandatory. Various evaluation processes can be taken into consideration for detecting credit card fraud such as recall, classifying fraudulent transactions, and correct classification of the normal transaction with precision. The research work will be done on the Python Jupyter notebook software. The whole process will be done to obtain the required output of the project. Various methods have been performed in the python software to obtain the required output of the project such as linear regression, lasso regression, elastic net regression, XGB regression, and gradient boosting. After obtaining the output as errors in the system the stacking is done to minimize the errors in the output of the system. Thus a securing process is obtained that can be used to prevent credit card fraud in the shopping malls. Thus ensemble learning plays a major role in obtaining a secure method of preventing credit card fraud in shopping malls.

1.3 Problem statement

Several problems are associated with the credit card fraud detection system using ensemble learning. The use of a machine learning algorithm can be taken into consideration in detecting frauds associated with credit cards. The unavailability of the datasets creates problems in the fraud detection process. The ensemble learning faces problems during interpretation and the obtained output of the ensemble learning is hard to predict. The operation of the system is very critical and wrong input can lead to lower predictive accuracy which is a severe problem that can create huge difficulties in detecting credit card fraud (Rashid et al., 2020). Ensemble learning is costly due to this factoring it creates several problems in detecting credit card fraud. The predictive model obtained from ensemble learning is expensive and difficult to understand. It costs more in creating, training and, deploying the model. Such problems in the system need to be overcome so that better outcomes can be obtained.

1.4 Research aim and objective

Aims:

This research aims to detect credit card fraud in shopping malls using ensemble learning.

Objectives: The objective of this research is

To identify the ensemble learning concept with the combination of various predictive machine learning techniques
To obtain accuracy in detecting credit card fraud using ensemble learning
To implement Python programming language in the system
To develop a secure method by which fraud detection can be done precisely

1.5 Research questions

There is some research questions associated with the credit card fraud detection system such as

How to implement Python programming language in detecting credit card fraud?
How do develop a secure method by using ensemble learning to detect credit card fraud?
Why python programming languages are used to detect such frauds?
What are the major challenges of a credit card fraud detection system?

1.6 Scope of the research

There are various scopes of research that are associated with the credit card fraud detection system. The dataset comprises important transaction data and based on the data the training and testing can be done to obtain the required output which can be beneficial for detecting fraud in credit cards. The obtained dataset can be used in future works by which credit card fraud can be minimized. The predictive model that will be prepared with the help of ensemble learning will be used in the future. The study of fraud detection has several future scopes and the obtained data set can be useful for future purposes.

1.7 Dissertation structure

1.8 Summary

The use of ensemble learning has played a major role in detecting credit card fraud in shopping malls. The importance of such machine learning techniques has increased drastically in recent times due to the increment of fraudulent activity by scammers globally. The basic idea of credit card fraud in shopping malls has been discussed in the introduction part of this research. A brief idea of ensemble learning and its relation to credit card fraud has been given in the background study of the research where the research approach is discussed. There are several problems associated with the credit card fraud detection system which are discussed in the problem statement of the research. The aim and objective of the research have also been discussed which helps to understand the research work. The future scope of the research work has also been discussed in the next section of this research. Thus a basic idea of the research work has been discussed in this section.

Chapter 2: Critical Review of Literature

2.1 Introduction

In modern times, credit card fraud in shopping malls has increased due to the advancement in technology. The fraudsters use various techniques to scam users and generate critical information from the customer in the shopping mall. Credit card transactions are of low-risk nature and due to this factor; fraudulent transactions associated with credit cards have been increased. Though the incidence of credit card fraud in shopping malls is limited yet the impact of such fraud in the finance sector is huge and it can cause a huge loss as transactions can be of large amounts. It is very important to prevent such fraud otherwise it can cause a huge loss to the e-commerce industry. The use of ensemble learning along with machine learning algorithms can be taken into consideration in this domain. Several security measures can be taken into consideration in preventing credit card fraud. The use of a one-time password on the mobile phone of the user can be beneficial so that no one can use the credit card details of the user while transactions. Apart from that, there are many other ways present that can be used in preventing credit card fraud such as securing the payment gateway, creating security questions during online transactions, and so on. These methods are not considered as full-proof methods that's why the evaluation of the model becomes essential. There are several problems associated with credit cards fraud detection such as unavailability of the datasets, skewed datasets, accurate evaluation parameters, and dynamic fraudulent behaviour. The datasets of the transaction details contain vital information about the customer which cannot be made public. Due to this factor, the unavailability of the data sets creates several problems in the fraud detection mechanism. The dynamic behavioural change of the fraudsters beats the functionality of the system to detect fraud transactions. Highly skewed datasets associated with the transaction details create several challenges in the fraud detection system. The affectivity of any classifier depends on its accuracy but in the case of credit card fraud detection, accuracy is not considered the correct measure due to the fact that the dataset of the transaction is of skewed nature. Such vulnerabilities in the system make the whole process of credit card fraud detection difficult. The use of ensemble learning plays a major role in detecting credit card fraud and the implementation of the techniques helps to obtain a predictive model that is beneficial for detecting the fraud associated with the credit card in shopping malls.

2.2 Previous Study of Literature

2.2.1 Overview of Ensemble Learning

Ensemble learning plays a major role in detecting credit card fraud in malls and it is a machine learning process that is used to create a predictive model through the combination of the various prediction model (Goyal, and Manjhvar., 2020). The ensemble learning method is classified into three main classes: bagging, stacking and boosting. The bagging ensemble learning method is used to obtain many decision trees on different samples and based on the decision tree averaging of the prediction is done (Shekhar, Kedia, and Guha., 2020). The stacking ensemble learning method is used to fit different models on the same data with the help of another model for learning the process of combining the best prediction model. The boosting ensemble learning method is used to add ensemble members sequentially which helps to correct the prediction model.

Bagging: it is an ensemble learning algorithm that requires a diverse group of members associated with the ensemble learning through verifying the training data. This technique involves a decision tree for training each model on a different sample within the same dataset with the help of a machine learning algorithm. The predictive model is then made with the help of ensemble members by using the statistics such as voting and averaging. Based on the obtained statistics a prediction model is prepared (Alharbiet al., 2022). The key elements of the bagging ensemble are bootstrap samples related to the training dataset, unpruned decision trees that fit on each of the samples, and simple statistics such as voting and averaging of the predictions.

2.2.2 Challenges in detecting credit card fraud

There exists a long list of obstacles that a developer can face while developing the fraud detection model. There is a very limited number of research studies that have analyzed a real-world dataset of credit cards due to the issue of confidentiality. However, the author (Randhawa et al., 2018) has taken some real-world credit card data sets from an institution of finance and analysed them. The main challenges that are involved in fraud detection of credit cards are:

The model that has been built for detecting fraud in credit cards must be efficient enough to quickly respond to the scam in time, in spite of the fact that enormous data are processed every day.
Data not being classified properly might be another major issue since all fraudulent transactions are not caught and reported.
Another challenge related to the fraud detection in credit cards includes the adaptive techniques that the scammers use against the developed fraud detection model.
Imbalanced data is also one of the major challenges that are faced during the process of detection of fraud. In this case, the number of fraud transactions is very less or almost negligible as compared to the transactions that are not fraudulent. This results in an imbalance number of frauds and honest transactions. It becomes very difficult to detect fraud transactions among such a large number of honest transactions.

As per the opinion of the researcher (Sadineni, 2020), the model that they have developed, does not assure the same outcome in every other scenario. They have tested their model on a smaller set of data. They are not sure whether their model will run successfully in case of a huge dataset or not. Their model proved to give accurate results but they faced difficulties during the machine learning and training process. The cost of training the system was too expensive for them. Challenges have arisen during the process of deep learning and while applying the machine learning algorithms. The challenges and issues that the authors of the paper (Singh and Jain, 2020), have discussed are as follows:

Lack of standard datasets of credit cards- In earlier research works, most of the researchers have used their own set of data to evaluate their proposed methods of detecting fraud cases.
As per them, criteria for standard evaluation do not exist so as to assess and compare the results obtained by the fraud detection model. The metrics of accuracy are not suitable due to the presence of imbalanced data.
Existence of insufficient algorithms to detect the new type of fraudulent patterns.
Fraudsters are intelligent enough to change their behaviour or fraud styles at times for obtaining the card details and somehow bypass the fraud detection model. So keeping up with them is a great challenge for developers of the fraud detection system.
Sometimes changes in behaviours of the cardholder from time to time due to certain circumstances may not be considered by the system as normal. This may result in the wrong detection of fraud by the system. This is where the system may fail to distinguish between fraud and honest transactions.
Developing the algorithms for pattern recognition of the fraudsters and customers is also a challenging task for the developers of the system.

All the above-mentioned issues and challenges must be kept in mind while developing the system or model of fraud detection. Much effort is required to overcome these challenges. It might not be possible to overcome all the challenges. However to obtain accurate results and a successful fraud detection model, at least some of these issues must be overcome.

2.2.3 Importance of secured credit cards

A secured credit card helps in building the credit score of a person. Since a person requires making periodic payments in order to clear the balance, secure credit cards count this as credit repayments. A secured, credit card is a card, which is backed by a deposit of a certain amount from the cardholder. The deposit made in it acts as account collateral, by providing the issuer of the card with security in such a case when the cardholder is unable to make any payment. The deposit that the person has done will now become the limit of the credit card of that person. As a result, the person will not have to face any kind of obligations or difficulties if any, during the payment process. Securing credit card is very important as fraud transactions results in huge loss to the business or company. Besides, the cardholder who has suffered fraud transactions also faces a huge loss of money. If certain simple security risks are overlooked, it might lead to stealing customers’ information and revoking the privileges of credit card acceptance. Lack of security in credit card transactions thus leads to data breaches, and lost revenue and may result in the loss of the customers to the merchant company. Using a secured credit card sends the account data of the cardholder to the credit bureaus. This is very essential in building the credibility of a person. Secured credit cards help the issuer of the card to get approved easily. It stimulates the potential to earn rewards. Another advantage of using a secured credit card is that the cardholder gets the amount that they had earlier deposited as collateral, refunded to their account.

People who do not have a credit history can avail of using this card to create the same in order to get a loan approval in the future.
Using this type of card helps in increasing the credit limit and earning good interest on fixed deposits.
It also helps people in availing of low-interest loans. It means that the secured credit cardholder can get any type of loan by paying low-cost interest as compared to the other customers.

Certain researchers have studied and understood the importance of the relationships between the bank and the firms in obtaining higher limits in credit. They have developed a model and confirmed from their findings that relationships are the most important for securing higher limits of credits. According to the paper of the author (Gencoglu, 2019), due to the advancement in the process of electronic data exchange and digital communication, most people communicate and share their private information knowingly or unknowingly in cyberspace. The credit card security process comes with the application of a reduction in authorized access from malicious activity that delivers a gateway authenticity system to the user account. This results in the users leaving their digital footprints in cyberspace. Now, this information is often unprotected and easily available to cybercriminals to access and manipulate. For this reason, security of the credit cards is very very important. It is extremely important and very essential to encrypt private information by using suitable encryption & decryption techniques and secure it in cyberspace to avoid data breaches.

2.2.4 Advantages and disadvantages of ensemble learning

Ensemble learning refers to the process in which multiple models like experts or classifiers are generated strategically and are combined for solving a definite problem related to computational intelligence. The paper (Ganaie, 2021) throws light on the different models of deep ensemble learning. This learning process has several advantages & disadvantages or pros & cons which are discussed in the following bullets.

Advantages:

This type of learning is used to improve the overall model performance. It improves the processes of the model like prediction, classification, function approximation, and so on.
It assigns confidence to the decision made by the model and helps it to select optimal features and correct errors.
Besides, it helps in incremental learning and data fusion.
Ensemble methods unlike other individual models, providence the users with higher predictive accuracy.
These methods are especially useful when the data set consists of both linear as well as non-linear types of data.
The authors of the paper (Gao et al., 2019) have made use of the methods of ensemble learning for improving the effects of detection. They proved through analysis and findings obtained by their developed model that using this ensemble method provides effective detection accuracy as compared to other models.
With the use of ensemble learning, project managers can easily deal with variance or bias and remove it. Variance refers to the scattered results that are generally difficult to converge. Bias, on the other hand, refers to the error or mis-calibration that occurred in achieving or obtaining the desired result.
This type of learning basically helps in bringing a consensus-based decentralized approach to ML that further helps in refining results and ensuring precision.
Most of the time, if a model uses this method it will neither be under fitted nor be over fitted.
Ensemble of models often is less noisy and at the same time, it is more stable.
Ensemble learning is often used to find results of better prediction like a high classification accuracy or smaller regression error.
As per the research works of the authors, (Chen, Dong, and Wu, 2022), ensemble learning methods can work by combining the pros of multiple learners and provide robustness, higher model accuracy, and overall induction ability. Therefore, it can be concluded that this method can be an effective technique to be used in crown profile modelling& prediction.

When it is difficult to rely upon one model, the use of the ensemble model can come to the rescue. This is the only reason why winners chose this model while being a part of ML or machine learning competitions. The advantage of the ML algorithm comes with the application of classification and regression techniques in a user system. Classification categorizes the set of statistical data whereas the regression identifies the patterns between those statistical data. The classification algorithm comes with a random forest, xgboost, naive Bayes, and SVM algorithm whereas the regression algorithm comes with random forest regression technique, lasso regression, elastic net, gradient, linear, and boost regression techniques that can be deployed over the security system to identify the patterns between those statistical data.

Disadvantages:

Besides, the advantages of Ensembling models, it also has certain disadvantages as the famous proverb says that “It is a necessary evil”. The disadvantages are as under:

The process of ensembling is a bit difficult to learn. However, it can be learned by experience.
If any wrong selection of method is done, it may result in obtaining a lower level of predictive accuracy as compared to that of an individual model.
The use of models of ensembling is quite expensive in terms of both space and time.
Ensembles can sometimes be difficult to interpret.
Using ensemble learning helps in improving endurance.

Other authors (Huang et al., 2019) have developed prediction models based on the methods of ensemble learning. The authors have used it in combination with extreme ML, multiple linear regression, extreme gradient boosting along with regression for the support vector.

2.2.5 Advantages of implementing machine learning in securing the credit card

With the use of machine learning, the entire population can be segmented furthermore effectively. The models for credit line management search for people having similar behaviour from the already existing data. In this way, it determines the worthiness of transactions done in credit cards based on that dataset. Those people who have good credit scores will receive higher credit limits.

The author (Nguyen et al., 2020), focuses on providing a thorough study related to deep learning methods for detecting fraud in credit card transactions and compares it with other different ML algorithms. These authors have made use of experimental results and proved that their proposed algorithm proved to show effective results as compared to those of traditional machine learning models. The proposed model can easily be used in real-world credit card fraud detection systems. The advantage of the ML algorithm comes with the application of a supervised learning method that can classify malicious activity from normal transactions. The ML classification algorithm can be deployed to classify the transactions accordingly. The classification algorithm can categorize the transaction with a high classification score, which reduces the chance of risks from the transaction data. While the model can provide a valid understanding of the program, the output of the algorithms needs to be identified based on the overall program. In case there are alterations that need to be made to the datasets the process of machine learning should always be done using coding.

The advantage of the classification algorithm over transaction data comes with

Reduction in number of malicious activity
Users can securely use their credit cards for online transaction
It adds a multilayer of security
A huge set of financial data can be analysed through this classification method
An additional layer of security reduces multiple transactions from a single account

“Naive Bayes” or “support vector machine” (SVM) classification algorithm can be deployed on the categorized data to classify the set of statistical data accordingly. Values with customer financial and transactional data can be secured by implementing an ML algorithm. Fraudulent user accounts can be mitigated through the process of classifying the categorized data with normal and fraud data. Proper implementation of ML techniques reduces unauthorized access from a user account. Another author (Vaithyasubramania, 2020), they have illustrated a new scheme of authenticating the credit card transaction by using a primary pin along with multifactor authentication to secure transactions of the credit card. They have made use of ML to develop this model. In their system, the model will give an alert to the customers or credit card holders about any kind of fraud in case of any kind of mismatch situation. The methodology proposed by them aimed to maintain the integrity, security, and privacy of the customer’s information that has already been entered into the system. The authors have also said that biometric authentication can also be used to overcome credit card-related threats. Biometric authentication is obviously a more secure process to maintain good authenticity.

Various techniques of machine learning can be used to detect fraudulent transactions in credit cards like Decision trees, Support Vector Machines, Artificial Neural networks (ANN), Random Forest, Logistic Regression, and so on. On the other hand, there also is space for the implementation of regression algorithms to the dataset, the implementation of algorithms such as Linear Regression, Lasso Regression, Elastic Net Regression, XGB regression, and Gradient boosting regression algorithms could also be implemented. This provides definitive ideas regarding the concept of ensemble learning. Overall, this process is highly useful to ensure that the database is able to extract information

The author (Sadineni, 2020) has mentioned all the machine learning techniques in their paper. They have performed the analysis of all the above-mentioned techniques by using precision, accuracy, and false rate metrics. The dataset that they have used to carry out their experiment is taken from a repository of Kaggle data. Thus, it can be concluded from all the reviews of previous studies how ML is advantageous to be implemented in securing the transactions done on credit cards and how well it can secure the private data of the credit cardholders.

A bagging ensemble is a general approach that can be easily extended. Various changes related to the training dataset can be obtained and the training of the data can be replaced with the help of begging ensemble and the mechanism which is used in combining the predictions can be modified with this learning.

Stacking: The stacking ensemble learning method is used to fit different models on the same data with the help of another model for learning the process of combining the best prediction model. Stacking ensemble learning has a unique nomenclature where members of the ensemble learning are referred to as level-0 models and another model is used to combine the obtained predictions and is referred to as level-1. In this learning, the two-level model is the most common approach however more layered model can be used. More than 3 level-1 models and single level-2 models can be used to combine the prediction of the level-1 model instead of using a single level-1 model (Zhang, Gardner, and Vukotic, 2019). Such models are used to make predictions with the help of a stacking ensemble. There are three main elements present in the stacking ensemble such as an unchanged training dataset associated with the model, implementation of a machine learning model to obtain the process of combining the predictions. and using different machine learning algorithms for every ensemble member (Nur-E-Arefin., 2020).

Various ensemble algorithms are associated with this approach such as blending, stacked models, and super ensembles (Sinayobye, Kiwanuka, and Kyanda., 2018).

Boosting: The boosting ensemble learning method is used to add ensemble members sequentially which helps to correct the prediction model. Boosting ensemble learning is used to correct the prediction errors. The models are added sequentially to the ensemble to correct the prediction that is obtained from the first model. The third model corrects the prediction of the second model and the whole is done in this way (Hu, Zhang, and Lovrich., 2021). The booting ensemble uses a very simple decision tree which is used to make single decisions. Such single decisions are referred to as weak learners. The prediction of such weak learners is combined with the help of simple statistics such as voting or averaging. The key elements of boosting are biased training data which is hard to predict, predictions combined with the help of the weighted average of the models, and the addition of ensemble members for correcting the prediction related to the prior models.

Several ensemble learning is based on this approach such as gradient boosting machines, stochastic gradient boosting, and AdaBoost which is often considered as canonical boosting.

There are several advantages and disadvantages associated with ensemble learning that are mentioned below.

Advantages: Ensemble learning poses several advantages which are beneficial for obtaining credit card detection. Higher predictive accuracy can be achieved with the help of ensemble methods compared to other predictive models (Jhangiani, Bein, and Verma., 2019). In the case of linear and non-linear data present in the dataset, it becomes difficult to prepare a predictive model, and in this scenario, ensemble learning can be beneficial. Ensemble learning is capable of producing a combination of different models to handle such cases. The bias or variance in the dataset can be reduced with the help of .ensemble learning and de to this factor the problem of underfitting or overfitting the model can be overcome (Panigrahi, Saitejaswi, and Devarapalli., 2019). The stability of the model can be obtained with the help of ensemble learning. This method is less noisy and easy to use. Such advantages of ensemble learning make the system more preferable and more reliable. The ensemble learning along with the python programming language is used in the detection of credit card fraud in shopping malls. Such advantages of learning can be beneficial for obtaining the required output of the system.

Disadvantages: There are several problems associated with ensemble learning which can cause several difficulties in obtaining the required output of the research. The ensemble learning faces problems during interpretation and the obtained output of the ensemble learning is hard to predict (Ryman-Tubb, Krause, and Garn., 2018). The operation of the system is very critical and any wrong input can lead to lower predictive accuracy which is a severe problem that can create huge difficulties in detecting credit card fraud. Ensemble learning is costly due to this factor it creates several problems in detecting credit card fraud. Such difficulties create several problems which need to minimize so that a better outcome can be obtained.

2.3 Literature gap:

Usage of ensemble learning in various sectors of purpose has expanded in recent years and these types of technology can be used in most of the critical sectors like business, corporate or other purposes. As the usage of credit cards in shopping malls during payment methods has increased in recent years, the fraud is also increasing for using credit cards at shopping malls. Although this issue of fraud while using credit cards can be solved by the modern technology of ensemble learning as discussed in this whole report. Ensemble learning or machine learning has come a long way from the remedial methods of the issues related to fraud and improving the payments methods oriented to credit card or debit card. But there are some literatures gaps also present in this technology.

The literature gap is the area where it could be found that are the areas that are not discussed in the previous researchers and the areas that are left unexplored from the previous reports. Generally scientists and developers try to create a framework or a structure to resolve the issues using the existing methods, they either try to improve new technology or try to update the existing remedial ways that are preferred in the researches that happened in the previous era. Literature gaps are mainly the inspiration behind the invention or updating new technologies and “research expansion” (Abdelrahman and Keikhosrokiani, 2020). The gaps can be further explored or invented to recreate the existing structure and for “better understanding of the discussion”.

It has been seen that new ensemble learning technologies have failed to perform well in the prediction of the number of users due to the “complex association features” in the access of the user during payments methods. There are many costumes that use these kinds of machine learning technologies to make payment for their bought items and various types of machines and mobile applications are being used as a medium for their payment (Yousefi et al. 2019). Even for the credit card there are more two or three methods as there are machine where the credit card can be scratched for the payment, there are machines that can access the credit card and fulfill the payment methods using the magnetic chip method without even touching the credit card and there are also some mobile application from which the payment can be done by applying the secret pin code. Due to these kinds of various methods of paying the price, ensemble learning technologies are facing difficulties due to the complexity of the applicable machines.

The variety of data which are given to machine learning technologies can show the lack of properly maintained model or structure of this technology and the gaps are clearly can e seen at any practical venue where this kinds of payment often happens like in a shopping mall where the customers are facing many obstruction while doing the employment most of the time but credit card. For this kind of complexity many frauds are happening under the drawbacks and suffering for the customers are continuously increasing.

Improving the literature gaps:

As it is analysed that what are the advantages and disadvantages of using the ensemble technologies for the detection and security measures of fraud happening in the shopping malls during the payment methods include the credit cards, the literature gaps also came in front and the harassment related to it (Goyal and Manjhvar, 2020). With the various gaps and drawbacks of the ensemble learning, there are also some remedial ways identified which can fulfil the literature gaps.

For fulfilling the voids of the at first the credit card need to be kept there where it cannot be easily accessible, without the access of the credit card for too many users, the fraud will decrease and one type of gap can be cured (Yontar et al. 2020). For the main issue or gap of using ensemble learning which is to provide data related can be cured using some steps. The authority of the shopping mall should obtain some more updated database system to categorize all the data that are daily added to their server from the customers. If the database is well structured and the data are categorized then the complexity will reduce for the data server (Keswani et al. 2020). There should be separated groups of tables of the data like for ones who use swipe facility or the ones who use magnetic touch facility and for the ones who use online credit card service. This separated and well categorized data will decrease the complexity and will help at the time of data analysing.

2.4 Summary:

The summary section of the part “critical review of literature” includes the brief discussion of the whole report and compiled theory of all the elaborated details of this report. At first an overview is given to understand the basic fundamentals of the usage of ensemble learning technologies to detect the frauds that are happening regarding credit cards and what are the database analysing tools or types of algorithm used to apply the methods. The difficulties that are faced by the technology regarding detection of credit cards and the steps of sorting problems are clearly discussed step by step. The ways of structured models that will be used for the future obstructions and potential threats is also included.

The secured credit card has various kinds of acuity in the modern era starting from getting any loan to payment of any product (Shivanna et al. 2020). In addition to the daily use of credit cards and busy schedules for the user there are often some minor security threats that are usually overlooked. That happens because of the flaws of this technology. All the gaps that are discussed here have to be identified in the early stage of the project and the actions that are to be performed are related to the issues. Using credit cards there are lots of advantages present regarding the payment methods and they also have risks and it can be said that from this discussion about critical literature review all the aspects whether it is good or it is risky is elaborated apparently with the possible reformative methods that could be used for the security measures in the coming future.

Chapter 3: Research Methodology

3.1 Introduction

The implementation of ensemble learning deals with implementing a number of selected machine learning models on a dataset. Based on these initial models, the program is used to find solutions that can provide an even better output by finding the most useful model by generalizing the models. This helps the python program to find solutions that will be able to represent the overall output of the program. This provides details regarding the alterations which can be made to the code to find the most relevant machine learning algorithm. In this case, the processes of Bagging, Boosting, and Stacking models have been carried out using the following algorithms:

Linear Regression
Lasso Regression
Elastic Net Regression
Gradient boosting regressor
XGB regressor

The implementation of these facets ensures that the database can be used to identify the solutions based on the requirements of the program. In case there are alterations that could be made to the output of the program, the output would be wildly different. Hence, the output of the initial dataset is majorly different from the initial database. The output, however, is entirely dependent on the initial table. It is vital to note in this case, the objective is to identify the most relevant algorithm used in the program, not its level of efficiency. Thus, even if the accuracy (or error) scores are not as high, it does not matter, the identification of the most useful algorithm in this case is.

Before the implementation of the program, it has to be understood that the initial process of preparing the database has to be carried out in this program. A vital aspect that was kept in mind while developing the program, in this case, was the size of the dataset. If a small dataset has been used, the overall program would take far less time to run, however, the implementation of the algorithm would be incorrect. This means that the output of the program would not be entirely relevant. Hence, the output of the program would be altered based on the initial data. Thus, a big dataset was selected that represented fraud prediction in the major institutions.

After the selection and importing of the dataset, it had to be checked for the presence of any redundant values. The removal of such values is vital to ensure that the output of the algorithms would not face issues associated with dealing with null and other redundant values. Initially, the dataset has to be checked for the presence of null values. The presence of the duplicate values would not be checked since, in a continuous dataset, the presence of the duplicate values is quite natural.

3.2 Research strategy

The process which has been followed in this program was entirely dependent on the implementation of the dataset. In this case, a major dataset was used to identify the issues associated with the development of the program. Initially, the dataset was downloaded from an online data library (Abdelrahman and Keikhosrokiani 2020). This provided the algorithm with an opportunity to develop outputs in the program. Hence, the overall application has been used to identify the advantages of implementing the dataset. The development of the output has been entirely based on the creation of the program in a manner that represents the overall dataset. In case there are alterations that could be used to find solutions based on the initial database, these had to be created based on the initial dataset.

The concept implemented in the program is based on the identification of the output. In this case, the beginning of the program is based on the initial algorithm of linear regression. This process is followed by the implementation of four more relevant algorithms. Finally, the output of these algorithms has to be used to stack the initially implemented algorithms of machine learning to create a completely new model (Alharbi et al. 2022). The selected models which have been implemented in the program are entirely based on this concept. After the implementation of the linear regression algorithm, the remaining algorithms of Lasso Regression, Elastic Net Regression, Gradient boosting regressor, and finally, XGB regressor algorithms were implemented. After the implementation of these algorithms, the implemented algorithms were identified as the most useful in identifying the outputs based on the dataset.

However, before the implementation of the algorithms, the overall process of implementing the algorithms, the dataset had to be prepared for this process. Thus, the dataset was split into subsets of a target column and the remainder of the dataset. In this case, the target column was selected as the "is fraud" column, kept under the variable "y". While all the remaining columns were kept under the variable "X". This ensured that the output of the column could be used to test the output of the program. Following this phase, the dataset has to be split into test and train segments. The train segment represented 75% of the dataset while the test segment constituted the remaining 25%. In case a smaller dataset had to be implemented in the program, the output could have been flawed (Chen et al. 2022). However, the effect of the output would not be entirely relevant to the program. Thus, a massive dataset has been used in the program to represent the output.

3.3 Research approach

In the financial transaction department, various types of fraudulent activity have been detected. The main motto of this research is to predict all these types of fraud-related activity in payment services. The prediction has been done after executing various analysis processes. Here data analysis has been executed based on the collected dataset that contains information motion related to the financial fraud-related activity. Detailed approaches to this research have been mentioned in this section to understand the initial approaches of the research process. All type of activity has been done on the python-based software IDE that is Jupyter notebook. The entire activity is done through python programming. This programming code has been run on this software platform (Ganaie and Hu 2021). Data mining is a trending approach in this current scenario where all type of activity has been done on the online platform.

Suitable data collection is the initial step of this research approach. This data set has been collected from an online platform that is “kaggle” (Kaggle.com, 2022). On this website, all type of informational dataset has been available. Here a fraud-related dataset has been collected from this website to develop the prediction model based on this historical informational set. Here these types of details are based on the current and past conditions of financial transactions. At first in this data analysis process, all required library functions are imported into this platform of python IDE. These library functions are helpful to import data into the platform and also help to use the inbuilt d\functions of python code. For example, seaborn is a python library that has been used here to develop all types of visualization plots. After that, the collected fraud dataset has been imported into the software ID. To execute the further process of analysis. After that, splitting of the dataset has been done to implement a learning algorithm on it.

3.4 Research design

The implemented method to develop outputs in the program included the use of a dataset that was downloaded from an online data library. This process ensures that the dataset was available for every observer to be able to download and review (Kostas 2018). Following this process, the dataset was imported into the program and an initial set of data cleaning codes were run to observe the output. The process can be carried out based on the initial database to identify the alterations which had to be made to the dataset. Upon checking for the presence of null values in the dataset it was observed that there were none. Hence the quality of the dataset was confirmed even more. In this case, there were no issues with the dataset; however, if there were some of the alterations required in the dataset, the missing values in the column could be replaced by the mean value of the column.

Following this phase, the dataset was divided into X and y sections. The “y” variable contained all the information regarding the target column, while the remainder of the table was kept under the variable “X”. Hence, the overall output of the program was entirely based on the creation of the algorithm and was able to represent the output of the machine learning algorithms (Nanduri et al. 2020). Following this phase, the dataset was split into test and train segments. While the train section contained 3/4ths of the dataset, the remaining 1/4th was kept under the test section of the dataset. This process ensured that the dataset was now able to be represented by the algorithm. Hence, the following step included the fitting of the dataset into the machine learning models. After the fitting of this model, the dataset was able to use the overall process of creating a relevant output and develop an output that could represent the program.

Following the implementation of the algorithms, the output of the program could be extracted. Hence the output could be used to find solutions for the overall processes of Bagging, boosting, and stacking. Following the initial process of the program, the dataset was able to sit through the passage of the process of developing outputs based on the initial creation of the program and developing outputs. Based on the initially implemented machine learning algorithms, the overall output of the program was then taken into consideration and stacked into one single machine-learning algorithm to be able to successfully create a dataset and ensure that the output could be used to build a new model by stacking.

It is vital to note that the process of visualization of the data has to be carried out only after the main process of ensemble learning has been carried out (Singh and Jain 2020). This aspect is important to remember since the development of the outputs was not completely guaranteed in the program. Hence, the output had to be extracted at first, only after this process of implementing the algorithms, the visualisations were carried out. In this case, the overall visualisation was then finally developed after the creation of the database was completed. This process ensures that the overall database was able to represent the program in a manner that ensures the program was able to represent the initial database in a visual manner. The graphical representation of the dataset ensures that the overall dataset was able to be represented in a highly meaningful fashion.

3.5 Data collection techniques

Data collection is an important part of the data analysis process. Based on this dataset the entire prediction has been done. A suitable dataset has been needed for this research process. Here a quantitative dataset has been suitable but in this dataset, there are son non-numerical values all these non-numerical values and null values are removed by cleaning the dataset by using python comment. The dataset has been chosen from the online platform, that is kaggle after that this dataset has been stored to understand the variable name: “credfraud” (Kaggle.com, 2022). Fraud-related details based on the current scenario have been available in this dataset. After collection of this dataset, there are multiple column structures such as type of payment, mane, old balance, present balance, amount, and other details. Based on these details the entire analysis has been executed. The entire analysis process has been executed on the python software platform to develop the error ratio after using the ML algorithm on this selected dataset. In this dataset credit card, fraud details have been mentioned that help to develop a prediction model based on this credit card fraud activity (Zhang et al. 2019). There are various activities that are executed on this credit card fraud details dataset to understand the hidden pattern of the dataset and also develop the best error score after execution regression methods. The entire analysis process has been executed after importing the dataset into the software platform.

3.6 Data analysis plan

In this, different types of machine learning techniques are used for recommending the process in the system for developing the features of the system. Linear regression is used for developing the requirements in the model that could be used for finding the line based on the points used that are available on the system. It also helps in developing the plot system for generating the process that would help in delivering the output based on the outputs of the system that would help design the process in the system for covering the output in the system. It also helps in predicting the values based on the input of the system which would be helpful for discovering the methods properly.

In this model, Lasso Regression is also used for completing the penalized product that would be helpful for creating the selection of the methods in their system. This process is completed using the techniques of the machine learning algorithms that could be considered for demanding the products in the system (Nur-E-Arefin 2020). It is based on the subsets of machine learning algorithms that help in predicting the system that allows the users for maintaining the requirements of the users based on the error of the system that would be utilized for developing the regression models in the system. It also helps in increasing the interpretation of the model based on our requirements of us which would help develop the system.

Elastic Net Regression is also used in the system which would be helpful for developing the model in the correct order so that it gives better errors in the system. It is also one of the regression models that would inspire the system for achieving the file for creating the models for interpreting the demands of the system. This would also use the terminology that could be helpful for finding the statistical learning process in hyper terminology that helps in generating the factors based on the requirements of the system. It also contains the process of bagging boosting and stacking as it could be helpful for developing the files based on the given factors in the system. It also helps in generating the parameters that would affect the system and increase the efficiency in the system would improve the process in the files. The process is used for defining the values in for, of alpha based on the penalties of the system (Sadineni, 2020).

Gradient boosting regressor is used in the system for calculating the values for maintaining the prediction in the system that could be helpful for identifying the target value in the system that could help in generating the residual value in the system. XGB regressor is helpful for implementing the procedures in the system that would be useful for developing the predictive models in the required system that helps in validating the cross models for predicting the new data in the system. It also helps in creating the final model that would be helpful for generating the new data in the system.

3.7 Ethical consideration

The creation of a program such as this consists of maintaining several facets to ensure that the development of the program does not impede laws. Hence, the process of creating the program has been based on maintaining the data privacy laws stated by the government of the UK. To maintain the law stated, it is vital to ensure that none of the names or any form of personal information is used in the creation of the program. Since this information can release private data for the users into the world, it is necessary to ensure that secrecy is maintained (Legislation.gov.uk, 2018). This process confirms that the members of the development method ensure that the program was able to satisfy the overall requirement and also ensures that the program is also able to carry out the process of ensemble learning (Vaithyasubramanian, 2020). Besides this, all the literature which has been studied for the creation of this program has been mentioned at the end of the research in the references section. This ensures that the original creators were respected for their contribution.

3.8 Summary

This chapter deals with the explanation of the techniques which have to be applied on a programming scale to ensure that the program was able to represent the issues associated with the development of the outputs. Initially, a dataset was imported into the Jupyter notebook application, which works within a Python environment. It was confirmed that none of the data being used in the creation of the program could be used to trace the original owners of the transaction (Prusti and Rath 2019). Following this process, the creation of the program could be carried out. Initially, the main requirements of performing the Bagging, boosting and stacking processes were carried out. These procedures were carried out first since the original process involved the creation and depiction of the dataset was an extremely hectic one. There were several alterations that had to be implemented from the original idea which had been implemented.

As an example, initially, it was understood that classification algorithms would be implemented on the dataset due to its categorical output column which could be used as the target column. However, this process could not be implemented since the machine learning algorithms had to be altered to achieve outcomes. Thus, instead of random forest, decision trees and other similar classification algorithms, this program implemented regression algorithms such as XGB regressor, Lasso regression and more. The alteration had to be made due to the issue of thermal throttling in the PC. While the outputs from the implemented classification algorithms could be extracted fairly easily, the fitting of these elements into the final stacking output could not be carried out in a similar manner (Jhangiani et al. 2019). Hence, these initial concepts had to be altered.

Upon extracting the outputs in the program, the final facet of developing the program had to be carried out which implements the concept of stacking the previous algorithms. This process provided outputs that could be used to notify the major difference among the error rates observed from the previously implemented algorithms. The extractable outputs in the program were used to find the solutions based on the initial data analysis which had been carried out.

Chapter 4: Analysis of Findings, Evaluation and Outcomes

4.1 Findings

The implementation of the program has to be carried out following the process of altering the dataset and ensuring that the program could be used to find solutions. The above libraries have been implemented in the program to ensure that the overall process of extracting information from the database could be carried out. The primary requirement of the program has been implemented in the python environment to identify the alterations which are necessary to be carried out in order to perform the process of ensemble learning. Besides these, another library has also been implemented in the program. The "pandas" library ensures that the program was able to import the dataset into the Jupyter notebook.

After the importing of the dataset, it had to be viewed to understand the overall contents of the program. From the dataset, it can be concluded that it contained three primary columns which were categorical in nature. Hence after the visualisation of the program had been carried out, these three columns could be converted into their categorical equivalents and the actual process of ensemble learning could be carried out. In this case, upon checking for the presence of null values in the dataset, the output depicted that there were no null values present, hence the dataset was true of very high quality.

Depicts the column "type" which has been implemented in the program to find out which form of payment had been used the most in the dataset. From this output, it can be concluded that the overall dataset contained cash withdrawals denoted by "CASH_OUT" in the column. This process was carried out before converting the dataset into its categorical output to ensure that the visualisation also represented the names of the categorical values instead of just the categorical equivalents.

The calculation of outliers in the dataset ensures that the data which contains outliers can be identified as quickly as possible. This process has been able to depict that the maximum number of outliers in the dataset is contained under the "TRANSFER" section. Hence, this finding indicates the notion that the most number of fraudulent activities occur in the process of fund transfers.

The scatter plot representation of the columns “oldbalanceOrg” and “newbalanceOrig” columns in the dataset. This process provided valid details regarding the alteration which could be made to the program based on the initial identification of the issues. This output could be used to identify the commonality of the outputs. Hence, this indicates that no major difference could be identified based on these two facets.

As mentioned before, after the visualisation of the dataset had been completed, the columns in the dataset had to be converter into their categorical equivalents. The alteration was necessary to be carried out to ensure that the program could be able to run all the codes. The conversion of the data into categorical was even more necessary since it ensures that the running of the program consumes lesser amounts of memory. This aspect was realised later since it was becoming impossible to run the program due to high CPU usage. Following the implementation of the categorical values in the dataset, the output of the program was able to be represented.

After the conversion of the dataset into its categorical components, the dataset looked as it has been represented in figure 4.7. This figure is able to represent the overall process of creating a dataset and also ensures that the dataset was able to be run in the program in a respectable manner.

The credfraud dataset has been depicted in the program to identify solutions based on the initial requirements. In this case, the output of the program was able to represent the process of handling the dataset and ensuring that two subsets were created for the completion of the program. In this case, there are two facets that are able to represent the dataset in the form of a target column and its equivalent output. The "isFraud" column in the dataset that has been depicted in the program was selected as the target column. This column contains the categorical output of the rows of data provided in the program. This process also ensures that the output was able to represent the process of creating the program in a manner that the dataset was able to be kept untouched as there were no manual alterations made to it. In case there were alterations necessary to be made to the program, these were made using codes in the program which could be not stand-alone commands.

4.2 Analysis

The above line of code has been implemented in the program to break down the initial dataset into train and test segments. This process is vital since it ensures that the output needs to be able to represent the overall dataset and also provide reliable outputs. Hence the splitting of the dataset has been carried out in this program. The ability of the machine learning algorithms to represent the error rates is pivotal to the study since it aims to determine a final step that further diminishes the rate of error in the program. This process needs to be able to be able to predict the output column of the dataset to ensure that the values in the column match.

An initial model had to be implemented in the program to identify the overall process which has been implemented in the program. This process helps in the extraction of the root mean square error in the implemented machine learning algorithm.

The implementation of the Linear regression model on the dataset was able to represent an error rate of 3.2 %. This output of the program had to be combined with the following two algorithms (Lasso and Elastic Net) to complete the process of Bagging in the program.

The implementation of the Lasso regression model on the dataset was able to represent an error rate of 3.3 %.

The implementation of the Elastic Net regression model on the dataset was able to represent an error rate of 3.3 %. The outputs from these outputs were initially added into an array to depict the overall output.

Upon implementing the Gradient boosting algorithm on the altered dataset, it was observed that the error rate of implementing the Gradient Boosting Regressor was 3.6%.

After the XG boost regression was implemented on the program it was understood that the output of the dataset was able to create a program that achieved the lowest rate before the implementation of the stacking process. The error rate of 2.4% was the lowest which could be achieved in the program based on the initial requirement.

The process of carrying out stacking on the dataset ensures that the dataset was able to represent the overall output in a manner that kept hold of all the datasets in a usable manner. In case there are alterations that needed to be made to the program they had to be made before this step. Since the process of stacking is practically the final step of the program no changes can be made to the program to find the final output. This step ensures that all the implemented algorithms from the previous outputs were implemented in obtaining the outcome. The process of stacking also ensures that the final model which is created using the program was able to provide outputs that have the lowest rate of error in the program. In this case, similar output was achieved, hence the output of the program was valid and reliable.

4.3 Evaluation

At the beginning of the program, before any error rates were observed as output in the program, an array had to be created which could store any data as pleased. In this case, the output of the program included calculating a series of error rates which were extracted by the use of machine learning algorithms. These outputs were stored in the created array. Thus, the output of the program was able to be represented by printing the output of the array. As the program carried on, each output was added to the dataset step by step to enhance its size and provide an overall output that could be used to find solutions in the program.

As mentioned in the previous figure, the output of the initially implemented machine learning algorithms has been depicted in the above figure. It can be observed that the output of error rates in Linear regression, Elastic Net Regression and Lasso Regression have been depicted using the array. The output of the array has been depicted based on the descending order. It can thus be noted that Linear Regression was the machine learning algorithm with the lowest rate of error.

After completing the stage of bagging in the program the root mean square error of the output had to be calculated. This process has been carried out in the program to identify which algorithm was the most successful in predicting the outcome with the lowest rate of error. The shape of the prediction could also be identified. Finally, the outputs which were observed in this section were added to the overall array which had initially been created in the program. Hence, the outcomes of the application could be used to identify the rates of error of the algorithms in a definitive manner.

4.4 Outcome

Depicts all output of all the rates of error that have been stored in the "modscor" array. This output is able to represent the overall process that has been implemented in the program to identify which algorithm has the ability to predict the outcomes with the least amount of error. It can be noted that the output of the stacking process was able to represent the best outcome and also ensured that the dataset was able to represent the outcome of the program. This output confirms that the program had been successful and was able to represent the overall process of stacking is highly useful in providing the best output among all the implemented algorithms in the ensemble.

4.5 Limitations

Every program is different with its personal set of requirements and subsequent problems which are encountered during the program creation phase. Thus, during the developmental phase of the program, there naturally are alterations that need to be made to the program in order to find solutions that are able to solve the issues. Similarly, in this case, there were issues associated with the process which has been used to carry out the tests. Initially, there were issues with the development of the output of the program. These outputs had to be carried out based on the initial factors which had to be extracted based on the initial intentions of the program. While the implementation of the program had initially planned to use classification algorithms such as random forest and decision tree during the bagging phase of the program, these outputs could not be fitted into the stacking phase, hence other facets had to be altered.

Another major issue that was encountered during the progression of the program was the issue of thermal throttling during the implementation of the program. This was especially the case during the implementation of the classification algorithms. Thus, to diminish the load on the PC, alternative output methods were being implemented. Upon carrying out implementation procedures, the output of the program was observed. However, it is vital to note that the overall time taken to extract the output of the program took far longer than initially planned. Thus, in case the output of the program is to be rerun and the outputs have to be observed, extra time needs to be kept aside since the extraction of outputs is bound to take more time than originally expected.

4.6 Summary

This chapter of the study deals with displaying the outputs of all the algorithms which have been implemented in the program. There are alterations that needed to be made to the system before the outcome could be achieved. A major change in the program was the initial machine learning algorithms that were planned to be implemented in the program. Hardware issues leading to high CPU usage massively diminished the chances of the application being used in the program. Thus, regression algorithms were used in the program instead. Before the implementation of these algorithms in the program, there were changes made to the dataset to ensure that effective outputs could be obtained. This process included the setting of a target column in the dataset which ensured that the outputs of the algorithm could be compared to find out an error rate. This process was followed by splitting the dataset into test and train segments.

After the processes of bagging and boosting, the final step of stacking the algorithms had to be carried out. Upon finishing the program, the output of the program had to be represented in a meaningful manner. This process was taken care of by appending all the relevant outputs of the program into a single array. This array has been named "modscor" which signifies the abbreviation of the terms model scores. Hence, this array has been created to identify and store all the scores of the algorithms in the program and ensures the identification of the best algorithm can be carried out. Finally, the process of stacking was carried out, this process provided definitive output regarding the alteration which needed to be carried out in the program and also provided definitive proof that the implementation of stacking provides the best possible output.

Chapter 5: Conclusion and Recommendations

The “conclusion and recommendation” part of this report concludes findings regarding the detection of credit card fraud using ensemble learning technologies. It can be stated that ensemble learning can be implemented to identify credit card fraud and minimize credit card fraud.

5.1. Linking with objectives

Objective 1:

The first objective that is taken in the early part of the report is to identify the concepts of this project with the help of various kinds of machine learning technologies and it can be stated that all the potential combinations and methods are discussed that are necessary to counter the problem of the fraud that is related to credit cards. All the discussed methods are taken to achieve this objective is machine learning oriented.

Objective 2:

Another objective was to understand the efficiency of the machine learning technologies regarding “credit cards fraud” and how much accurate the data produced by it (Shekhar et al. 2020). As all the aspects related to fraud that can occur during the payment time using any credit card is well discussed, starting from how the frauds can happen and what are the causes due to which these kinds of fraud are happening to all the potential remedial measures that can be taken to solve the issue and to fill the videos which are discussed throughout the literature review part.

Objective 3:

Next objective was to identify the overview of the implementation of the “python programming language” in the machine learning system and how the language can be proven helpful while countering the issues. Python language is a platform which works as an independent entity across multiple platforms and as all the modern technologies are compatible with this language the developers do not need to overwrite any codes. For its framework and variety of structured libraries, the database which is connected to the main server of ensemble technologies can do all the activities like payment through credit card or the remaining works easily (Khan et al. 2022). As python programming language is assumed the best for the “Artificial Technologies” and it can offer great “consistency and simplicity” it can be said the objective of involving python language is to succeed and be well derived.

Objective 4:

The final objective was to invent a security measure to detect fraud that can be done smoothly. For the justification of fulfilling this objective there are several measures and methods discussed in the report.

There are many segments of this report such as “linear regression”, “lasso regression”, “elastic net regression”, “ensemble modelling” etc that can state many security measures and the steps of these measures which are wirren. All the codes and commands are clearly shown and discussed in this project that stated how the security measures regarding the fraud detection have performed structurally to mitigate the risks and threats that are occurring during use of credit cards (Krishna et al. 2021). After analysing these commands in “Python programming languages” it can be said the objectives and initiatives are fully achieved.

5.2. Future prospect of the study

“Ensemble learning” is a sub part of “Artificial Intelligence” and many automated systems can be used which can help to mitigate all the problems and issues which came out while using the credit card in shopping malls in recent days. As the usage of credit cards while paymenting is increased the chances of fraud regarding it also expanded and it has become a very common problem nowadays. The report includes the discussion of the study that performed to find the possible security measures that can be used in this problem of fraud. There are several directions of this study which will be useful in the upcoming future.

Automotive systems:

As this report is on detection of credit card and the payment methods of credit card includes the machine learning technologies, the precautionary measures that are suggested here can be used in all types “AI” technologies that are used to these kinds of payment methods like debit card payment or online payment method (Safa and Ganga, 2019). As all the payment methods that used cards are driven by programing languages and the codes written on it, this report shows all the possible flaws that can occur during the implementation of any programing language and the potential ways to mitigate these issues, the future technology can use these methods and add the learnings to perform better on their project as all the potential outcomes are already given here.

Improved customer experience:

According to a survey that is performed for academic purposes, almost 86% of the organizations taken have said that the “AI” technologies have helped them to improve their customer experiences and help them to attract more customers (Mauritsius et al. 2020). It is simple to understand that if the customers of the shopping malls will continue to face the issues while paymenting with credit card and if they lose the money for fraud, then the number of customers will decrease which will be harmful for the shopping malls. The precautionary measures that are invented by the modern technologies like ensemble learning and “Artificial Intelligence” will play a critical role in the future aspects of these shopping malls and other organizations.

Boost productivity:

The shopping malls and other businesses will be beneficial with their customer satisfaction as well as increasing productivity. If the issues of fraudation related to the credit card is omitted with the help of machine learning, then a huge obstruction which lies between the customer and businessman can be omitted and the “demand and supply” chain will continue to go as usual which will definitely boost the productivity for the business (Prusti and Rath, 2019). While the payment methods run quickly then it will be beneficial for the customers as well as for the shopping mall owners which will leave an impact on the social economy and usage of these kinds of technology and the programming languages will continue to grow in the corporate sectors.

Mass growth for data units:

It has become very common to engage with modern programming attributes such as “coding, systematic activities, engineering by technology, and information units” and using such technologies can further improve every aspect of the “AI” systems and machine learning. As the size and volume of the database are increasing day by day and analysing the data is becoming a critical problem for the computing systems (Nanduri et al. 2020). That is one of main reasons why the voids are created in a nebels learning system which leads to credit card fraud. For storing and analysing the data the suggested “AI” technologies can be used. If all the data are organized systematically and all the security measures are regularly updated then the fraud regarding the credit cards will less happen and if it happens then the fraud will be detected quickly and fast actions will be taken.

5.3. Recommendation

As the technologies discussed and suggested here are very useful regarding the machine learning oriented issues and it can be said that there are several sectors on which these technologies can be recommended.

Implementation the ensemble learning to identify fraud detection of credit card purpose:

The issue of fraud happening during the paymenting by credit cards in several shopping malls is growing rapidly in recent days. Users are suffering from many frauds and cheats to credit cards and with their bank accounts and it has been seen that all the frauds are happening at the time of their payment for their goodies (Nur-E-Arefin, 2020). Identifying the root cause of this problem should be the first job to do and it is recommended to use these suggested technologies to counter these issues. As the machine learning technologies are programming language oriented, it is proven that the “Python programming language” is very efficient to analyse any data whether the volume is small or huge and it is very effective at decision making and suggesting the possible outcomes and remedial methods. That is why the identifying part of these problems will be easy while using this software and it will detect all the flaws from where the frauds are occurring (Rashid et al. 2020). Payment methods by using credit card is a critical way for any computing cause this includes several servers and many data like data from banks, information of the credit card agency etc. Fraud can access the whole account of the users and can steal all the money from that is why it is very important to go step by step while mitigating the problem of credit card fraud.

Implementing the machine learning system to avoid the potential threat of fraud:

The ensemble learning technologies can predict all the potential risks and threats that can leave an impact on the system and can create the holes and drawbacks from which the frauds can happen. As all the possible threats can be known it will be very easy to detect the potential remedial methods and avoid the frauds that are likely to happen. By analysing all the information and records that are given to machines, it can state the potential threats and the ways to avoid it. Using analysing methods of all the risks any “AI” systems can detect some precautionary measures and from that it will be easy to pick one of the best possible methods and likely to avoid the frauds.

5.4. Conclusion

“Ensemble learning technologies” has become unavoidable in recent years as it is directly involved in almost every sector where “Artificial Intelligence” is used thanks to the developers and scientists all over the world. In order to achieve the goal of “fast paced development” and “innovation in the automation sector” machine learning technologies have come far from the starting point.

In order to conclude the report many sectors are to be briefed. There is a dataset given in this report named “ratio of payment types” where ratio between type and count can be shown for several entities such as “payment”, “transfer”, “cash out”, “debit” and “cash in”. From this graph an overall view can be observed regarding the dataset of “categorical output” and “categorical withdrawals”. From the “scatter plot” which is between “new balance” and “old balance” the details about the “alteration” which is created to detect the primary issues of the starting of the project can be obtained and it can be used to identify the “commonality of the outputs”. “The calculation of outliers” which is made by the “Python Jupyter notebook” is created to identify the outliers instantly. As it can be seen in the given image that the highest value obtaining entity is “transfer” that means that most of the fraud happened at the time of money transfer.

The “outputs of error rates” can be obtained from the “bagging algorithm” as discussed in this report and using this array the “Elastic Net Regression and Lasso Regression” can be done. After bagging the “RSME” output the final process to be done is “Final Staking Output”. This output result can give a view of the overall process of identifying the algorithms and which kind of algorithm can predict the highest and least amount of errors. The conclusion part cannot be completed without discussing the limitations of the technology. This program was initially created for the algorithms like “random forest and decision tree” and that is why it cannot be matched with the required dataset thus the projected values are too altered.

The discussed graphs and programming languages are to address “the fundamentals and advanced topics” related to “Ensemble learning” and its usage of fraud detection of credit cards during the payment. The study of several techniques that how the technologies of machine learning evolved and how it can be beneficial with many aspects like fraud detection.

References

Abdelrahman, O. and Keikhosrokiani, P., 2020. Assembly line anomaly detection and root cause analysis using machine learning. IEEE Access, 8, pp.189661-189672.

Alharbi, A., Alshammari, M., Okon, O.D., Alabrah, A., Rauf, H.T., Alyami, H. and Meraj, T., 2022. A Novel text2IMG Mechanism of Credit Card Fraud Detection: A Deep Learning Approach. Electronics, 11(5), p.756.

Chen, Y., Dong, C. and Wu, B., 2022. Crown Profile Modeling and Prediction Based on Ensemble Learning. Forests, 13(3), p.410.

Ganaie, M.A. and Hu, M., 2021. Ensemble deep learning: A review. arXiv preprint arXiv:2104.02395.

Gao, X., Shan, C., Hu, C., Niu, Z. and Liu, Z., 2019. An adaptive ensemble machine learning model for intrusion detection. IEEE Access, 7, pp.82512-82521.

Gençoğlu, M.T., 2019. Importance of Cryptography in Information Security. IOSR J. Comput. Eng, 21(1), pp.65-68.

Goyal, R. and Manjhvar, A.K., 2020. Review on Credit Card Fraud Detection using Data Mining Classification Techniques & Machine Learning Algorithms. IJRAR-International Journal of Research and Analytical Reviews (IJRAR), E-ISSN, pp.2348-1269.

Hu, X., Zhang, X. and Lovrich, N.P., 2021. Forecasting identity theft victims: Analyzing characteristics and preventive actions through machine learning approaches. Victims & Offenders, 16(4), pp.465-494.

Huang, Y., Yuan, Y., Chen, H., Wang, J., Guo, Y. and Ahmad, T., 2019. A novel energy demand prediction strategy for residential buildings based on ensemble learning. Energy Procedia, 158, pp.3411-3416.

Jhangiani, R., Bein, D. and Verma, A., 2019, October. Machine learning pipeline for fraud detection and prevention in e-commerce transactions. In 2019 IEEE 10th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON) (pp. 0135-0140). IEEE.

Kaggle.com, 2022. Synthetic Financial Datasets For Fraud Detection. https://www.kaggle.com/code/arjunjoshua/predicting-fraud-in-financial-payment-services/data

Keswani, B., Vijay, P., Nayak, N., Keswani, P., Dash, S., Sahoo, L., Mishra, T.C. and Mohapatra, A.G., 2020. Adapting machine learning techniques for credit card fraud detection. In International Conference on Innovative Computing and Communications (pp. 443-455). Springer, Singapore.

Khan, A.T., Cao, X., Li, S., Katsikis, V.N., Brajevic, I. and Stanimirovic, P.S., 2022. Fraud detection in publicly traded US firms using Beetle Antennae Search: A machine learning approach. Expert Systems with Applications, 191, p.116148.

Kostas, K., 2018. Anomaly detection in networks using machine learning. Research Proposal, 23, p.343.

Krishna Rao, N.V., Harika Devi, Y., Shalini, N., Harika, A., Divyavani, V. and Mangathayaru, N., 2021. Credit Card Fraud Detection Using Spark and Machine Learning Techniques. In Machine Learning Technologies and Applications (pp. 163-172). Springer, Singapore.

Legislation.gov.uk, 2018. Data Protection Act 2018. https://www.legislation.gov.uk/ukpga/2018/12/contents/enacted

Mauritsius, T., Alatas, S., Binsar, F., Jayadi, R. and Legowo, N., 2020, December. Promo abuse modelling in e-commerce using machine learning approach. In 2020 8th International Conference on Orange Technology (ICOT) (pp. 1-6). IEEE.

Nanduri, J., Jia, Y., Oka, A., Beaver, J. and Liu, Y.W., 2020. Microsoft uses machine learning and optimization to reduce e-commerce fraud. INFORMS Journal on Applied Analytics, 50(1), pp.64-79.

Nguyen, T.T., Tahir, H., Abdelrazek, M. and Babar, A., 2020. Deep learning methods for credit card fraud detection. arXiv preprint arXiv:2012.03754.

Nur-E-Arefin, M., 2020. A Comparative Study of Machine Learning Classifiers for Credit Card Fraud Detection. International Journal of Innovative Technology and Interdisciplinary Sciences, 3(1), pp.395-406.

Panigrahi, S., Saitejaswi, K. and Devarapalli, D., 2019, February. Teju: fraud detection and improving classification performance for bankruptcy datasets using machine learning techniques. In Proceedings of International Conference on Sustainable Computing in Science, Technology and Management (SUSCOM), Amity University Rajasthan, Jaipur-India.

Prusti, D. and Rath, S.K., 2019, October. Web service based credit card fraud detection by applying machine learning techniques. In TENCON 2019-2019 IEEE Region 10 Conference (TENCON) (pp. 492-497). IEEE.

Randhawa, K., Loo, C.K., Seera, M., Lim, C.P. and Nandi, A.K., 2018. Credit card fraud detection using AdaBoost and majority voting. IEEE access, 6, pp.14277-14284.

Rashid, M.M., Kamruzzaman, J., Hassan, M.M., Imam, T. and Gordon, S., 2020. Cyberattacks detection in iot-based smart city applications using machine learning techniques. International Journal of environmental research and public health, 17(24), p.9347.

Ryman-Tubb, N.F., Krause, P. and Garn, W., 2018. How Artificial Intelligence and machine learning research impacts payment card fraud detection: A survey and industry benchmark. Engineering Applications of Artificial Intelligence, 76, pp.130-157.

Sadineni, P.K., 2020, October. Detection of fraudulent transactions in credit card using machine learning algorithms. In 2020 Fourth International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud)(I-SMAC) (pp. 659-660). IEEE.

Safa, M.U. and Ganga, R.M., 2019. Credit Card Fraud Detection Using Machine Learning. International Journal of Research in Engineering, Science and Management, 2(11), pp.372-374.

Shekhar, H., Seal, S., Kedia, S. and Guha, A., 2020. Survey on applications of machine learning in the field of computer vision. In Emerging technology in modelling and graphics (pp. 667-678). Springer, Singapore.

Shivanna, A., Ray, S., Alshouiliy, K. and Agrawal, D.P., 2020, October. Detection of Fraudulence in Credit Card Transactions using Machine Learning on Azure ML. In 2020 11th IEEE Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON) (pp. 0268-0273). IEEE.

Sinayobye, J.O., Kiwanuka, F. and Kyanda, S.K., 2018, May. A state-of-the-art review of machine learning techniques for fraud detection research. In 2018 IEEE/ACM symposium on software engineering in africa (SEiA) (pp. 11-19). IEEE.

Singh, A. and Jain, A., 2020. An empirical study of aml approach for credit card fraud detection–financial transactions. International Journal of Computers Communications & Control, 14(6), pp.670-690.

Vaithyasubramanian, S., 2020. Authentication using robust primary PIN (Personal Identification Number), multifactor authentication for credit card swipe and online transactions security. International Journal of Advanced Computer Science and Applications, 11(4), pp.541-546.

Yontar, M., Namli, Ö.H. and Yanik, S., 2020. Using machine learning techniques to develop prediction models for detecting unpaid credit card customers. Journal of Intelligent & Fuzzy Systems, 39(5), pp.6073-6087.

Yousefi, N., Alaghband, M. and Garibay, I., 2019. A comprehensive survey on machine learning techniques and user authentication approaches for credit card fraud detection. arXiv preprint arXiv:1912.02629.

Zhang, J., Gardner, R. and Vukotic, I., 2019. Anomaly detection in wide area network meshes using two machine learning algorithms. Future Generation Computer Systems, 93, pp.418-426.

Warghade, S., Desai, S. and Patil, V., 2020. Credit card fraud detection from imbalanced dataset using machine learning algorithm. International Journal of Computer Trends and Technology, 68(3), pp.22-28.

Tiwari, P., Mehta, S., Sakhuja, N., Kumar, J. and Singh, A.K., 2021. Credit Card Fraud Detection using Machine Learning: A Study. arXiv preprint arXiv:2108.10005.

Mittal, S. and Tyagi, S., 2020. Computational techniques for real-time credit card fraud detection. In Handbook of Computer Networks and Cyber Security (pp. 653-681). Springer, Cham.

Zhou, X., Cheng, S., Zhu, M., Guo, C., Zhou, S., Xu, P., Xue, Z. and Zhang, W., 2018. A state of the art survey of data mining-based fraud detection and credit scoring. In MATEC Web of Conferences (Vol. 189, p. 03002). EDP Sciences.

Konasani, V.R. and Kadre, S., 2021. Machine learning and deep learning using python and tensorflow. McGraw-Hill Education.

Deng, X., Cao, S. and Horn, A.L., 2021. Emerging applications of machine learning in food safety. Annual Review of Food Science and Technology, 12(1), pp.513-538.

Munir, M., Chattha, M.A., Dengel, A. and Ahmed, S., 2019, December. A comparative analysis of traditional and deep learning-based anomaly detection methods for streaming data. In 2019 18th IEEE international conference on machine learning and applications (ICMLA) (pp. 561-566). IEEE.

Aziz, S. and Dowling, M., 2019. Machine learning and AI for risk management. In Disrupting finance (pp. 33-50). Palgrave Pivot, Cham.

Razmi, P., Buygi, M.O. and Esmalifalak, M., 2020. A machine learning approach for collusion detection in electricity markets based on nash equilibrium theory. Journal of Modern Power Systems and Clean Energy, 9(1), pp.170-180.

Doshi-Velez, F. and Perlis, R.H., 2019. Evaluating machine learning articles. Jama, 322(18), pp.1777-1779.

Zhao, X., Lovreglio, R. and Nilsson, D., 2020. Modelling and interpreting pre-evacuation decision-making using machine learning. Automation in Construction, 113, p.103140.

Carta, S., Fenu, G., Recupero, D.R. and Saia, R., 2019. Fraud detection for E-commerce transactions by employing a prudential Multiple Consensus model. Journal of Information Security and Applications, 46, pp.13-22.

Ata, O. and Hazim, L., 2020. Comparative analysis of different distributions dataset by using data mining techniques on credit card fraud detection. Tehnički vjesnik, 27(2), pp.618-626.

Lee, S. and Chung, J.Y., 2019. The machine learning-based dropout early warning system for improving the performance of dropout prediction. Applied Sciences, 9(15), p.3093.

Yang, T., Wang, L., Shen, Y., Shahzad, M., Huang, Q., Jiang, X., Tan, K. and Li, X., 2018, August. Empowering sketches with machine learning for network measurements. In Proceedings of the 2018 Workshop on Network Meets AI & ML (pp. 15-20).

Kanjanawattana, S., 2019. A novel outlier detection applied to an adaptive k-means. International Journal of Machine Learning and Computing, 9(5), pp.569-574.

Guo, C., Wang, H., Dai, H.N., Cheng, S. and Wang, T., 2018, August. Fraud risk monitoring system for e-banking transactions. In 2018 IEEE 16th Intl Conf on Dependable, Autonomic and Secure Computing, 16th Intl Conf on Pervasive Intelligence and Computing, 4th Intl Conf on Big