+44 203 318 3300 +61 2 7908 3995 help@nativeassignmenthelp.co.uk

Pages: 6

Words: 1453

Machine Learning

Introduction-Machine Learning

Get free samples written by our Top-Notch subject experts for taking assignment help services.

i) Data exploration

Data exploration works as an initial investigation on data that discovers the pattern, hypothesis, anomalies, and assumption. The data analysis process identifies the null value over the columns of each attribute; however, it represents statistical graphs that describe the relationship between the attributes (Sahoo et al. 2019). E-commerce businesses are dependent upon their visitors and purchases from that shop. Visitors are important in E-marketing as they result in the transmission of funds and company data. The exploration is the critical data processing to explore and identify the possible areas that can improve their marketing and customer relationship.

Figure 1: E-commerce data import

The above image shows the dataset import command and the visualization, where the dataset consists of administrative duration, informational value, product-related duration, and the generated revenue of that E-commerce site.

Figure 2: Null value checking

The above image shows the null value checking command over the dataset, as the null or missing values decrease the performance and efficiency of the model. Reduction of null value before machine learning algorithm implementation is the key step for data wrangling.

Figure 3: Pair plots for checking data

The above pair plot graph represents the patterns, anomalies and relationships between the different data categories present in the database. This showcases the distribution of the relationship and the between the single and the different variables.

Figure 4: Scatter plot representation

The above figure represents the groups of data that were found among the major dataset. The above scatter plot represents the relation between the “revenue” and the “exit rates” for the dataset values. The clusters forms in areas with “6 to 8” revenue and “-2 to -1” with similar formations forming in other similar categories. These represent the congestion of the similar categories of variables forming clusters and the areas with no clusters showing absence of variables.

Figure 5: Heat map representation

The above figure representsa the heap map for the df1 dataframe by considerign the values of “exit rates” and “revenue”.

ii) Data pre-processing

Data preprocessing results in the quality of data from the dataset that can be processed by

  • Data cleaning
  • Data reduction
  • Data transformation

Data cleaning describes the null value and missing value checking from the dataset, which increases the performance and the efficiency (Pearson, 2018). Reduction of data reduces the compiled time and the threshold value can be removed from the dataset. Data transformation defines the conversion of data from alphabetical to numerical, which helps to nurture the reach attribute for analysis. 

The above image shows the null and missing value-free dataset from the original dataset, it increases the performance and efficiency of the total model. The missing value does not exist in the dataset and generated new dataset returns the other value that remains unchanged.

The above dataset is the result of a data reduction process, where the missing data are removed from the dataset. The reduced data allows a less running time with high performance and accuracy. It represents the condensed representation data from a huge dataset that results in efficient and similar output after the reduction of data volume.

The above image shows the conversion of data from a string value to a numeric value, where the weekend data has been changed accordingly to reduce its complexity at the time of output (Batch and Elmqvist, 2017). 

The above image shows the transform data set from string value to numerical value, where the weekend attributes contains only the string value that has been converted into numerical value for better performance and accuracy.

All the generated values are implemented in the Jupyter notebook that produces the predicted results for the relationship between the attributes that develop an E-commerce business.

iii) Model implementation

The representative classification methods that have been implemented in this assignment are as follows:

K means clustering

The dataset that has been used in this application consists of several rows of data that can not be completely assessed in a simple manner. In this case, the aim of the program is to find unlabeled groups of data. Thus, the representation of this information has required an algorithm that is able to represent the final data in a meticulous manner. For this purpose, the program needed to implement an algorithm that could address the necessity of representing the data based on patterns. Hence the use of the “K means clustering” algorithm was used to find the possible changes that were to be implemented in the application.

Linear Regression

Linear regression can be used to find the graph of the selected values. Similarly, the program can also be used for predicting the values that are used for creating the graphs. This program has used the second factor. Graphs were not created, instead, the values that were to be followed to plot the graph was represented, as depicted in the figure below.

The predicted values, represented above, determine the path that would be followed by the graph.

Logistic regression

Following the logistics regression algorithm an accuracy score was achieved, as depicted in the figure below:

This accuracy score could have been far higher if the dataset that was used could been trimmed into a smaller section. Logistic regression is the process of statistical analysis (geeksforgeeks.org, 2021).

iv) Performance evaluation

The accuracy scores that are presented in the program are the evaluation of the performance of the algorithms. For example, the use of a logistic regression algorithm as depicted in Figure 3 shows the level of accuracy that has achieved. This level of accuracy could only be achieved after the initial process of dropping unnecessary data. Thus the level of accuracy could also be enhanced in case the algorithm was using a smaller dataset. The heat map that had initially been created in the program could be used to evaluate the overall performance of the dataset. This heat map depicts the path that was travelled most by the revenue and exit rates.

v) Result analysis and discussion

The logistic regression algorithm that was used in this program was able to achieve an accuracy score of nearly 85%. This level of accuracy is considered very high since the number of data that has been used in this program is massive. This level of accuracy could not have been achieved if the initial dataset was not trimmed. For the purpose of trimming this dataset, it was made devoid of any redundant data. Thus, the unnecessary columns were dropped and the null values present in the dataset were dropped as well. The algorithm was able to achieve a high accuracy score due to these factors. In data exploration graphs has been plotted for scatter plot, heat map plot, histogram plot, and box plot. This plot defines the comparison between the different variables in the dataset. Logistic regression has been done with accuracy scores and represented by heat map plot. Linear regression has also been done accordingly with respective line plot graph represented as per accuracy score.


Batch, A. and Elmqvist, N., 2017. The interactive visualization gap in initial exploratory data analysis. IEEE transactions on visualization and computer graphics24(1), pp.278-287.

Geeksforgeeks.org, 2021. K means Clustering – Introduction. https://www.geeksforgeeks.org/k-means-clustering-introduction/

Geeksforgeeks.org, 2021. Linear Regression (Python Implementation). https://www.geeksforgeeks.org/linear-regression-python-implementation/

Geeksforgeeks.org, 2021. Understanding Logistic Regression. https://www.geeksforgeeks.org/understanding-logistic-regression/

Pearson, R.K., 2018. Exploratory data analysis using R. CRC Press.

Sahoo, K., Samal, A.K., Pramanik, J. and Pani, S.K., 2019. Exploratory data analysis using Python. International Journal of Innovative Technology and Exploring Engineering (IJITEE)8(12), p.2019.

Recently Download Samples by Customers
Our Exceptional Advantages
Complete your order here
54000+ Project Delivered
Get best price for your work

Ph.D. Writers For Best Assistance

Plagiarism Free

No AI Generated Content

offer valid for limited time only*