Develop Machine Learning Models To Predict E-Commerce Visitors

Introduction - Develop Machine Learning Models To Predict E-Commerce Visitors

Want the Best Assignment Help in the UK? Look to Native Assignment Help for unparalleled expertise and support. Our dedicated team of professionals goes above and beyond to ensure you receive top-quality assignments that exceed your expectations.

i) Data exploration

Data exploration works as an initial investigation on data that discovers the pattern, hypothesis, anomalies, and assumption. The data analysis process identifies the null value over the columns of each attribute; however, it represents statistical graphs that describe the relationship between the attributes (Sahoo et al. 2019). E-commerce businesses are dependent upon their visitors and purchases from that shop. Visitors are important in E-marketing as they result in the transmission of funds and company data. The exploration is the critical data processing to explore and identify the possible areas that can improve their marketing and customer relationship.

The above image shows the dataset import command and the visualization, where the dataset consists of administrative duration, informational value, product-related duration, and the generated revenue of that E-commerce site.

save up to

35%

On Each Order!

Place order now

Get Extra 10% OFF on WhatsApp Offer use my discount

The above image shows the null value checking command over the dataset, as the null or missing values decrease the performance and efficiency of the model. Reduction of null value before machine learning algorithm implementation is the key step for data wrangling.

The above pair plot graph represents the patterns, anomalies and relationships between the different data categories present in the database. This showcases the distribution of the relationship and the between the single and the different variables.

The above figure represents the groups of data that were found among the major dataset. The above scatter plot represents the relation between the “revenue” and the “exit rates” for the dataset values. The clusters forms in areas with “6 to 8” revenue and “-2 to -1” with similar formations forming in other similar categories. These represent the congestion of the similar categories of variables forming clusters and the areas with no clusters showing absence of variables.

The above figure representsa the heap map for the df1 dataframe by considerign the values of “exit rates” and “revenue”.

The above figure represents the histogram plot for the comparing the results of the “Exit rates” and “Revenue”.

The provided image represents the box plots for the related variables present in the dataset.

ii) Data pre-processing

Data preprocessing results in the quality of data from the dataset that can be processed by

use my discount

Data cleaning
Data reduction
Data transformation

Data cleaning describes the null value and missing value checking from the dataset, which increases the performance and the efficiency (Pearson, 2018). Reduction of data reduces the compiled time and the threshold value can be removed from the dataset. Data transformation defines the conversion of data from alphabetical to numerical, which helps to nurture the reach attribute for analysis.

The above image shows the null and missing value-free dataset from the original dataset, it increases the performance and efficiency of the total model. The missing value does not exist in the dataset and generated new dataset returns the other value that remains unchanged.

The above dataset is the result of a data reduction process, where the missing data are removed from the dataset. The reduced data allows a less running time with high performance and accuracy. It represents the condensed representation data from a huge dataset that results in efficient and similar output after the reduction of data volume.

The above image shows the conversion of data from a string value to a numeric value, where the weekend data has been changed accordingly to reduce its complexity at the time of output (Batch and Elmqvist, 2017).

The above image shows the transform data set from string value to numerical value, where the weekend attributes contains only the string value that has been converted into numerical value for better performance and accuracy.

All the generated values are implemented in the Jupyter notebook that produces the predicted results for the relationship between the attributes that develop an E-commerce business.

iii) Model implementation

The representative classification methods that have been implemented in this assignment are as follows:

K means clustering

The dataset that has been used in this application consists of several rows of data that can not be completely assessed in a simple manner. In this case, the aim of the program is to find unlabeled groups of data. Thus, the representation of this information has required an algorithm that is able to represent the final data in a meticulous manner. For this purpose, the program needed to implement an algorithm that could address the necessity of representing the data based on patterns. Hence the use of the “K means clustering” algorithm was used to find the possible changes that were to be implemented in the application.

Linear Regression

Linear regression can be used to find the graph of the selected values. Similarly, the program can also be used for predicting the values that are used for creating the graphs. This program has used the second factor. Graphs were not created, instead, the values that were to be followed to plot the graph was represented, as depicted in the figure below.

The predicted values, represented above, determine the path that would be followed by the graph.

Logistic regression

Following the logistics regression algorithm an accuracy score was achieved, as depicted in the figure below:

This accuracy score could have been far higher if the dataset that was used could been trimmed into a smaller section. Logistic regression is the process of statistical analysis (geeksforgeeks.org, 2021).

iv) Performance evaluation

The accuracy scores that are presented in the program are the evaluation of the performance of the algorithms. For example, the use of a logistic regression algorithm as depicted in Figure 3 shows the level of accuracy that has achieved. This level of accuracy could only be achieved after the initial process of dropping unnecessary data. Thus the level of accuracy could also be enhanced in case the algorithm was using a smaller dataset. The heat map that had initially been created in the program could be used to evaluate the overall performance of the dataset. This heat map depicts the path that was travelled most by the revenue and exit rates.

v) Result analysis and discussion

The logistic regression algorithm that was used in this program was able to achieve an accuracy score of nearly 85%. This level of accuracy is considered very high since the number of data that has been used in this program is massive. This level of accuracy could not have been achieved if the initial dataset was not trimmed. For the purpose of trimming this dataset, it was made devoid of any redundant data. Thus, the unnecessary columns were dropped and the null values present in the dataset were dropped as well. The algorithm was able to achieve a high accuracy score due to these factors. In data exploration graphs has been plotted for scatter plot, heat map plot, histogram plot, and box plot. This plot defines the comparison between the different variables in the dataset. Logistic regression has been done with accuracy scores and represented by heat map plot. Linear regression has also been done accordingly with respective line plot graph represented as per accuracy score.

References

Batch, A. and Elmqvist, N., 2017. The interactive visualization gap in initial exploratory data analysis. IEEE transactions on visualization and computer graphics, 24(1), pp.278-287.

Geeksforgeeks.org, 2021. K means Clustering – Introduction. https://www.geeksforgeeks.org/k-means-clustering-introduction/

Geeksforgeeks.org, 2021. Linear Regression (Python Implementation). https://www.geeksforgeeks.org/linear-regression-python-implementation/

Geeksforgeeks.org, 2021. Understanding Logistic Regression. https://www.geeksforgeeks.org/understanding-logistic-regression/

Pearson, R.K., 2018. Exploratory data analysis using R. CRC Press.

Sahoo, K., Samal, A.K., Pramanik, J. and Pani, S.K., 2019. Exploratory data analysis using Python. International Journal of Innovative Technology and Exploring Engineering (IJITEE), 8(12), p.2019.