Name: A Review of Dynamic Pricing Methods Assignment Sample
Brand: Assignment Help UK
Rating: 4.9/5 (16480 reviews)

Table of Contents

Introduction: A Review of Dynamic Pricing Methods
A. Research Aim and Objectives
II.Literature Review
A. Technological Improvements of Online Business
B. Dynamic Pricing
C. Existing Models for Dynamic Pricing
D. Data Mining in Business and Machine Learning Methods
E. Machine Learning Methods Used in Dynamic Pricing
III.Methodology
A. Data Collection
B. Data Pre-processing
C. Feature Selection and Model Application
a) Feature Selection
b) Model Selection
D. Experiment Environment
E. Ethics

Pages: 35 Words: 8838

Introduction: A Review of Dynamic Pricing Methods

In recent decades, electronics and networking have experienced a transformation, and there is every sign of significant technological change and the usage of information technology. The decreased cost of communications as a product both of technical advances and increasing demand has followed and encouraged rapid growth in the capacity and usage of modern digital technology. The computing capacity of microchips is multiplied every 18 months in line with Moore's theorem. Such technologies offer several critical possibilities which often pose significant obstacles. Innovations in IT today have wide-range implications of other aspects of society and politicians struggle with concerns such as economic growth, intellectual property rights, privacy security and transparency and availability of knowledge. Choices being adopted would have long-term implications, and their social and economic ramifications must be taken into consideration. Some of the most significant outcomes of IT advancement is undoubtedly e-commerce via the Internet, a modern business process. This will alter the economic operation and the social climate drastically, though it is just a few years old. This also impacts diverse industries such as communication, finance and retail and may extend to fields such as health and education. This requires the smooth implementation of ICT over the entire supply chain of an electronically managed enterprise.

Any commuter is acquainted with the phenomenon: those commuting to during work sometimes have to shell off the same day at a price varying than what they might have expected to pay for a litre of petrol-it might be a couple hundred more or less at lunch break. The cost of an airline fare or hotel lodging also is changing dynamically: today, only after everybody has booked together is it that all travellers in a row of seats on the plane pay the same price for their trip. Nevertheless, it is not just the mobility industry which uses dynamic price change flexible price changes on times of the day, regions and so on; this strategy is increasingly used particularly by online retailers.

In the field of e-Business, most companies struggle with the task of setting the best prices for the products and services they offer their customers. Using the right pricing scheme can determine the future of a business or product. Advances in e-Commerce and internet connectivity have made it possible for online companies to gather information about their competitors and customers quickly and to set flexible prices based on the value customers to place on their products and services (Narahari, Raju, Ravikumar, & Shah, 2005), supply and demand rate, and time change (Reinartz, 2002). The act of changing prices of goods and services based on real-time market changes is referred to as Dynamic pricing. Several online and offline businesses have turned to this pricing strategy to retain their existing customers and attract new ones (Gupta & Pathak, 2014).

Dynamic pricing is a common phenomenon that has had a significant impact on several industries like retail, hospitality, electricity, transportation, mobile communication, etc. (Gupta & Pathak, 2014). Most online businesses employ machine learning methods such as regression models to predict continuous or numeric outcomes based on features of historical data. This, in turn, informs their business decisions in optimising several aspects of their businesses to suit customer needs (Coker, 2014). Some of the major companies known for the application of machine learning methods for dynamic pricing are Uber, Amazon, airline companies, Airbnb, eBay and lots more.

It is essential to determine the specific effects on the usage of available resources as well as on maximization of sales while determining the optimal pricing model. Although higher fares result in higher profit per ride, the number of booked trips concurrently decreases. In comparison, different classes of consumers are addressed at cheaper costs, resulting in higher power utilization. The market feature – more specifically, the quality elasticity of production – must be calculated to assess the right combination of such impacts. Complex pricing structures are the secret to predicting the market function. It is a causal problem to measure the demand function. Randomized experiments or systemic modelling may adapt to it, but not traditional prediction approaches, like predictive or market analytics, which eventually yield wrong results and contribute to incorrect decisions. At present, development and internet firms are most much utilizing randomized trials, as digitized systems are perfect for this. It was not feasible during our research – previous evidence gathered needed to be included after the study. There are only data obtained under the "present market scheme", and the underlying issue of causal deference is this. Nonetheless, the problem is whether customers can cope with a modern (so-called counterfeit) price paradigm, under which there are no purchases. Further observations need to be made and a so-called structural model built to react to this problem.

To measure the price elasticity of payment, a cost-effective model, a customer preference, has been developed first. When the fare in the system is below its peak willingness to pay, a customer chooses to book the ride. The above is based on many variables, including the temperature, where the travel and destination are situated and the time of day. The calculation of the overall supply of purchase details to compensate for dividends from price and quantity. Initially, details regarding booking requests where the seen price was higher than the cost of the user's booking (so-called bubblers), i.e. with the demand the customer did not order a taxi. The knowledge is quite helpful to assess the actual payment would. Secondly, many internal and external statistics were used to measure the ability to pay-about 350 variables in all. Thirdly, it has been possible to render a precise approximation by many measurement units utilizing advanced deep learning techniques. The future strategical behaviour of the consumers was a problem in deciding the optimal pricing model. If the pricing seen on the app is beyond the willingness to pay, a customer may wait for and book later if the cost is lower. Extensive modelling experiments and analyses between various pricing models were performed to consider these consequences. The developed pricing structure is a non-linear market mechanism in which a variety of variables determines the demand. This culminated in a rise in productivity of around 5 to 15 per cent per region and a substantial improvement in earnings. Yet the advent of competitive pricing inevitably resulted not only in higher costs but also in a corresponding rise in the usage of energy. Nearly half of the fares were lowered by the regression pricing model, thus raising the degree of occupancy.

During the study, the value of a contract and, as a result, the quality of individual factors is essential. The actual cost would depend on where to travel to reach, which therefore used in the pricing model. The desire to pay for transportation to and from the banking and corporate districts as well as tours of the suburbs is relatively stable. The spatial variance is specific: demand elasticity is lower than market elasticity in very economically stable cities with increasing prices in undeveloped areas. Time, distance, and temperature will now be required to compensate and are also used in the quality calculation. The case study demonstrates that methodological methods can be handled sensitively to boost not only revenue and benefit but also business productivity and consumption in competitive pricing. However, it is not useful to use artificial (deep learning) intelligence and other machine learning methods (ML). The best way to take good market options is by the application of economic philosophy. To order to minimize and effectively eliminate potential adverse results, multiple modelling tests and plausibility checks were carried out during the study. In the future, the application of artificial intelligence in an effective pricing strategy would have tremendous potential.

Machine learning methods automatically identify hidden patterns in a dataset which they, in turn, used to predict future occurrences or more data (Murphy, 2012). This is an effective and efficient technique used for making informed decisions in business rather than going with ‘gut-feelings’. Based on current research, the most common methods used for creating dynamic pricing models include classification methods such as random forest, support vector machines, neural networks, Gradient-boosting Machine (GBM), Multilayer perceptron (MLP), Naïve Bayes, K-Nearest Neighbours, Multiple Linear Regression, etc. (Bose & Mahapatra, 2001; Gupta & Pathak, 2014; Ye et al., 2018).

Airbnb is an online platform for sharing accommodation that is, individuals in need for accommodation use the platform to find hosts who are willing to share their spare rooms (Ye et al., 2018). This platform has become one of the highly recognised businesses in the “sharing economy” p.2. (Gibbs, Guttentag, Gretzel, Yao, & Morton, 2018). This online customer-to-customer business has also become one of the most recognised companies that employ the use of a dynamic pricing strategy (Magno, Cassia, &Ugolini, 2018a). However, this peer-to-peer platform provider does not impose prices on listings; instead, they developed a pricing model which suggests prices to hosts based on supply and demand (Ye et al., 2018). Based on the background knowledge of this platform, this study is aimed at using an Airbnb dataset to analyse the influence of big data on dynamic pricing.

For this project, Airbnb dataset obtained from an open data source Kaggle will be used to analyse the impact of big data on dynamic pricing. Previous research hinged on time change as a primary factor for dynamic price prediction in the hospitality sector. However, this project intends to contribute to the growing body of research by providing a systematic approach to creating a machine learning model that considers several variables as price determinants (Wang &Nicolau, 2017). This project will follow a CRISP data mining process which involves business understanding, data understanding, data preparation, modelling, evaluation, and results. The significance of this project hinges on providing a simple systematic workflow of data science processes that can be used for several types of problems.

Looking for Assignment Help UK for students? Look no further than Native Assignment Help. Our team of experienced professionals is dedicated to providing top-notch assistance to students across the UK, ensuring they excel in their academic endeavours.

A. Research Aim and Objectives

The major aim of this research is to present a systematic approach to the process of developing and designing an effective model that outputs predicted Airbnb rental price. The following objectives have been established, to accomplish the research aim:

To review relevant literature in relation to similar research areas.
To outline the end to end process of exploring and analysing an Airbnb dataset.
To compare different machine learning algorithms for developing effective price prediction models.

B. Research Questions

This research aims at providing answers to the following research questions:

What variables are mostly responsible for dynamic rental pricing?
What machine learning models are most suitable for rental price prediction?

II.Literature Review

A. Technological Improvements of Online Business

The rapid growth of digital and mobile technologies in recent years has led to the emergence of new web-based business services such as sharing platforms. These information technologies include digital platforms which innovative start-ups or companies use to render connectivity services. Sedera, Lokuge, Grover, Sarker, & Sarker (2016), describe a digital platform as “a technology architecture that allows development of its own computing functionalities and allows the integration of information, computing and connectivity technology platforms available to an organisation” p.367.Ciracì (2013) also defined digital platform as a “software, which can be used exclusively online, generally performing simple applicative functions, which exploit the principles of digital convergence of hypermedia and ubiquity of the network, in order to implement contents sharing practices (multimedia sphere) and data structures (hypertext sphere) practices, such to be used also by users inexperienced in technology and computer science” p.114.

These web-based services offer peer-to-peer, consumer-to-consumer or business-to-business sharing platforms for connections of demand and supply within a specific industrial sector. (Breidbach & Maglio, 2016; Magno et al., 2018a; West, Gaiardelli, &Rapaccini, 2018).Several studies agree that sharing platforms have created a new business model called the “sharing economy” (Blal, Singal, & Templin, 2018; Wang & Nicolau, 2017).Dufva, Koivisto, Ilmola-Sheppard, & Junno (2017) highlight some examples of market drivers in the sharing economy like Airbnb and Uber. Airbnb is a digital accommodation sharing platform that connects accommodation seekers to potential hosts (Magno, Cassia, & Ugolini, 2018b). Uber on the other hand, is a company in the public transport sector which operates through a smartphone application. It connects customers who need a ride to taxi drivers in different locations across the world.

According to Van Alstyne, Parker & Choudary (2016), one of the major impacts Information technology has had on online businesses is the reduction in the need to own physical infrastructures and assets. For instance, Booking.com which operates in the hospitality sector does not own any hotels, but it is valued more than the Marriott hotel chain, which has over 4,000 properties around the globe (Austin, Canipe, & Sarah, 2015). Technological advancements in online business has led to the emergence of a new pricing strategy called Dynamic Pricing. This pricing optimisation strategy is obtained by analysing the massive amount of data generated by using digital platforms. This is to identify patterns in customer behaviour and product demand which is in turn used to dynamically optimise prices. (Schmidt, Möhring, & Keller, 2016; Steppe, 2017).

B. Dynamic Pricing

Dynamic pricing also known as price optimization, surge pricing, demand pricing or time-based pricing is a pricing strategy which deals with rendering products and services at different flexible prices to customers according to customer or market demands. In the e-Business sector, pricing optimization is based on several factors including competitor’s pricing, supply and demand, conversion rates, and sales goals. (Gupta & Pathak, 2014). It has been suggested that the most frequently used dynamic pricing strategies are price changes based on different time segments namely, real-time pricing (RTP), time-of-use pricing (ToU) and critical-peak pricing (CPP) (Meng, Zeng, Zhang, Dent, & Gong, 2018). (Deksnyte, Lydeka, & Zigmas Lydeka, 2012) highlight some dynamic pricing determinants which include customer behaviour and characteristics, fair prices, market structure, product demand, perception of product value and seasonality.

Dynamic pricing though an old concept is widely used in several online and off-line businesses such as hospitality, travel, entertainment, retail, electricity and public transport. Different industries apply a different price optimization approach based on its goals and customer’s value. (Schmidt et al., 2016). According to (Huang, Hsu, & Ho, 2014), the fashion industry is not left behind in this trend. The fashion industry adopts the use of dynamic pricing to instigate and influence market demand, such as clearance sales or markdowns which is not based on seasonal changes. Research discusses dynamic pricing in different ways. Some authors examined this strategy from the online business point of view, for comparing prices with competitors by implementing web-crawling algorithms that capture prices and product specifications (Sahay, 2012).

Schmidt, et.al (2016) conducted a research on dynamic pricing systems for online-retailer’s core functionalities using a single case study research method. Their research reveals that online retailers do not only consider internal factors based on their cost of production or desire to maximise profit, but also put into cognisance external information such as anniversaries of buyer to influence their decisions positively.

C. Existing Models for Dynamic Pricing

Studies by Narahari, et al. (2005) and Gupta & Pathak (2014) identify various existing mathematical models for dynamic pricing computation. In these mathematical models, pricing is treated as an optimisation task. Dynamic pricing models are divided in several categories based on the mathematical tool used:

Inventory-based model: Pricing techniques focused on inventory have been applied in several areas of business. Aguirregabiria (1999) analyzed, for example, a superstructure system, discovering that when sales orders are put, retail prices begin to decrease, and two orders rise. In the personal computer sector, Byrnes (2003) reflected on the pricing policy of Dell as follows: 'During regularly modified competitive markets Dell's markets were constant, but weekly rates differed considerably as they were changed to raise goods where the inventory of the components reached prescinding amounts." Empirically, Zettelmeyer et al. ( 2006) also shown that, whenever a dealer moves to the average stock level through supply shortages, the sales price falls by around 1% or 15% of the overall margin for income per car per dealer. Els, and thus price fluctuations even when demand is stable can be observed. Another explanation behind this stock pricing is because a high stock volume allows the firm to slash costs to raise competition and reduce the cost of inventory keeping. In various fields of literature, this rationale was formally studied, e.g. in Aguirregabiria (1999), Federgruen and Heching (1999), and Chen and Simchi-Levi (2006). Understanding the drivers of profitability in competitive pricing is essential to both business and researchers. Would higher market variabilities boost or reduce the advantage of competitive pricing over set rates in the business activities alluded to above? How does this impact other factors of competitive demand, such as the cost of employment and profits, if any? And in the absence of a Joint Decision on market and inventory rates (by modification of refuelling quantities), the market changing alone is a standard procedure as demand and cost parameters shift. This often reflects the lack of coordination between a company's sales and operational units.
Data-driven model: this model applies statistical or econometric techniques for price prediction based on data collected about customer purchasing patterns and preferences.The volume of non-payment purchases nearly increased in the amount of cash withdrawal transactions through bank cards, according to the World Payments Report 2018 last year. It supports implicitly the fact that critics have spoken about money being "electronic" for a long time. However, the value of having in-depth industry awareness and the characteristics of its key players-consumers, vendors and rivals-often rises provided that online details become increasingly essential to e-commerce companies. For offer the right products, the business sector will be measured for the highest willingness to purchase and pay and at what price point the maximum demand, sales and earnings would produce. A comprehensive segmentation creates the most critical market data focused on socioeconomics, interest drivers, decision-makers, and pays loyalty. The new solution to the differential pricing Strategy requires a higher utilization of existing data from suppliers to further assign specific rates on the same products, commodities, and consumer segments at various periods. During the year (weeks, days), these approaches allow providers to monitor demand and to operate with the most competitive customers. The Big Data era demands new roles in integrating, storing, and analysing vast quantities of weakly formalized data effectively. The principal activities relevant to socio-economic data processing are statistical modelling and retail forecasting, optimal communication approach, consumer turnover prediction models, etc. Some of the functions of pricing differentiation are to fix rates, so that it may be marketed to specific groups of consumers who will not purchase the good at low cost. Moreover, raise it for substantial buying power segments at the same time. With a data-driven method, the patterns of demand can be measured and contrasted to previously obtained figures, and other market considerations can be identified. But the principle is also similar-to increase the income of corporations.
Agent-based model: This is a rule-based modelling technique used to analyse individuals or group pricing based on computational methods and actions.Some of the critical facets of the relationship between the supplier or service provider and the consumer are the distribution of a contract or a commodity quality. It is usual for the price to be determined by an empirical process based on previous system observations. There are numerous aspects in which market methods, statistical or modelling models can be defined, analysed, and improved. These may help minimize the impact of utilizing scientific information and observing the actual environment without altering its structure and functionality. The above-listed pricing problem representations were built from several perspectives, including game theory, optimization, and simulation, providing a wide variety of models to deal with the price strategy dilemma. One of these is the policy of dynamic pricing; the use of products or services is widespread, including hotel rooms and aviation tickets. When buyers do not purchase goods of this feature until the end of the selling period, they are losing income. A further clarify of this pricing approach is given in the competitive pricing strategy to reduce losses by achieving the best price for specific times or parts. This market technique has many conceptual representations of the optimization solution. A model methodology for simulation may help to discuss possibilities and expand the usage and comprehension of this pricing strategy, particularly for anyone who is searching for ways to maximize his profit dependent on a perishable commodity.
Game theory model: This model is used by sellers competing for a specific customer base to compute dynamic prices for the target customer of group.Pricing is one of the most critical of the numerous decisions made by businesses. Decision-makers provide a method to control demand not only for the quality of the good or service but also through multiple campaign activities (through example mailings or call centres). Therefore, stock costs, as well as influences on which it has been exposed, have affected the ultimate consumer's preference. The dynamics described by these elements can be formed by a game theory that offers results centred on a stable economic context to explain the behaviour of agents to optimize their advantage in non-cooperative environments. Nevertheless, data mining has been used in businesses for more than 20 years to collect data from company repositories. It is an effective method for identifying consumer reaction trends which are not easily detected. Those two methods (i.e. the philosophy of data mining and gaming) have been used today to explain related events, albeit with minimal experiences. These initiatives aim to merge both strategies utilizing the broad economic framework that game theory uses to structure the relationships that determine strategic behaviour and sophisticated data mining techniques to collect knowledge from data firms. This model provides an extremely accurate customer-level market forecast, centred on Support Vector Machines, which, unlike traditional economies, can accommodate multiple variables from various sources. This demand serves to empower a game-theory-built model that details the situation of the companies in the market, providing both customers and competitors with an integrated picture.
Machine learning model: This model is trained using historical customer data to identify patterns in customers’ preferences and buying habits. Algorithms are then used to predict prices based on the patterns identified for profit maximisation and revenue generation.E-commerce operations produce vast volumes of data that a team of individuals cannot handle. ML eliminates humans and offers real-time pricing approaches. Dynamic pricing is where marketers change the commodity rates based on demand and supply. ML is an alternative, automation, simply because thousands of products cannot be analysed manually by man. But the merchant wants to store not only his product, but also his incoming data before that. To create competitive price strategies, the ML-powered program gathers info. The further data sets, the easier the learning method and the increased performance, ML functions on fundamental theory. Over time, only performance improvement is achieved by ML-based software. Analysts will also, for example, provide feedback for temperature, production, business running costs, pricing, minimum price and the best price during the measurement of competitive prices. Finally, ML strategies will regularly search the web to gather valuable pricing details from rivals with related goods, customer views on items, and the market trend of sales over the last number of days/weeks with effective pricing strategy. And what is the gap in competitive pricing as ML is being used? AI and ML require broader data processing, contributing to a better resolution.
Simulation model:Dynamic pricing may be carried out in a variety of ways, described as raising prices in a sector. Price discrimination or tailored pricing discrimination is an enticing competitive market environment in which retailers bill specific consumer groups at equal rates. Price discrimination While there is a great deal of promise in this field, there is an often-higher likelihood of consumer refusion, as shown when Amazon.com compared multiple pricing with consumers. This work focuses on changing price over time in a market, which makes no premise or attempts to segment the buyer population in a subgroup, contrary to this approach of dynamic pricing. This competitive market analysis reflects on how a retailer will leverage over time, using a limited period, the variations in total buyer demand.

D. Data Mining in Business and Machine Learning Methods

Technological advancements and the on-going growth of Internet has led to the production of enormous volumes of data. Businesses and professionals have realised the importance of extracting data and conducting further analysis to gain insights and make informed decisions. The velocity at which data continues to grow is alarming however, there are vast data storages which have the capacity to handle the volume. According to Seng and Chen (2010), the major challenge is in making sense of the data stored in the database which ordinarily might seem meaningless thereby converting it into a tool for “competitive intelligence”. From a data warehouse point of view, data mining can be understood as a higher level of on-line analytical processing (OLAP). However, data mining has exceeded the limited scope of “summarization-style analytical processing of data warehouse systems by adopting more advanced techniques for data understanding”. (Han &Kamber, 2001 cited in Seng & Chen, 2010).

According to Murphy (2012), machine learning is an “automated method of data analysis” (p.1). Machine learning methods which was developed in scientific fields like Machine Learning, Applied Statistics and Pattern Recognition, is described “as a collection of methods for extracting (predictive) models from data”. (Provost & Fawcett, 2013, p.39). This field of study originated from Artificial Intelligence which was concerned with analysing data from real-world and making futuristic predictions about undisclosed quantities. Machine learning methods were soon widely implemented leading to the close ties between Machine Learning, Applied Statistics and Pattern Recognition. Data Mining (or KDD: Knowledge Discovery and Data Mining) is a spinoff research field from Machine Learning which is majorly concerned with real-world applications like business issues of data analysis. (Provost & Fawcett, 2013).

Data mining is an iterative process with multiple steps, performed to until meaningful business knowledge is derived. The first step is the data preparation stage which involves choosing a historical data for analysis, cleaning and pre-processing. This historical data can either be obtained from a single source or several sources which is merged together during the preparation stage. The cleaning phase removes discrepancies and inconsistences for instance converting data to a different scale, identifying variables with possible predictive attributes and dimensionality reduction. The next step is the model learning stage which involves analysis of the data to identify models or patterns (relationships among data). The model is then evaluated and interpreted into actionable business strategies (Bose &Mahaptra, 2001; Provost & Fawcett, 2013).

Pattern recognition models developed in Applied Statistics (Statistical modelling) are the most rigorous to implement as strict distribution criteria has to be met. However, algorithms based on machine learning methods are easier and faster to implement as they do not impose rigorous restrictions hence the broad usage in data mining applications.

E. Machine Learning Methods Used in Dynamic Pricing

Gupta and Pathak (2014) used an online market dataset to predict the purchase behaviour of an online customer and assign a price range based on dynamic pricing. Using R, SAS and Excel tools for their study, data was collected from a transactional database which contained both categorical and continuous variables. K-Means clustering algorithm was used for customer segmentation i.e grouping customers based on their similarities. Based on the clusters identified, a dynamic price range was identified for each segment. In their study, statistical and machine learning technique was used to identify the appropriate price range for each segment. Logistic regression was finally used to predict if a customer was likely to purchase the product or not.

Morita (2018) used an online retail market dataset to analyse the predictive power of machine learning methods in comparison to econometric methods for online pricing. For the econometric approach, logistic regression analysis was used to evaluate the firm’s price adjustment behaviour. The study evaluated several machine learning models including linear algorithms, non-linear algorithms, Bayes algorithms, boosting algorithms and tree algorithms for each dataset to investigate the best performance. Several researches investigated the performance of machine learning models in dynamic pricing. However, they mostly compared one machine learning algorithm performance with econometric analysis performance to evaluate the best model for dynamic pricing in the online business field. According to Fernandez-Delgado et al. (2014), the machine learning library contains several machine learning algorithms, but no algorithm is perfect for every dataset. This study intends to investigate and answer question “which machine learning algorithm is the most suitable for dynamic pricing tasks”?

III.Methodology

The proposed methods adopted for this study is based on those in a 2017-18 MSc dissertation by Ding, Zulian“Analysis of online P2P lending risk by machine learning approaches”. The research method Ding (2018) used in the study is a quantitative approach as a numeric dataset was collected, and the aim of the research is to compare different machine learning models. The dataset nature and the aim of research is highly like that of my current study. However, the context and machine learning models to be compared are slightly different.

Ding (2018) used a loan data of a lending club to compare machine learning models including Gradient Boosting Decision Tree (GBDT), Logistic regression, Support Vector Machine (SVM), Random forest, Naïve Bayes, k-Nearest Neighbours algorithm and Multi-model Ensemble Learner. These models were compared to determine the model that best analyses online P2P lending risk and predict loan default. This study proposes to use an accommodation sharing platform data to compare machine learning models including but not restricted to Random Forest, k-Nearest Neighbours and Support Vector Regression. The aim is to identify the model and features that best analyses and predicts optimised prices for listings. Like Ding (2018), the study will follow three main steps to achieve its aim and objectives.

Data collection
Data pre-processing
Feature selection and model application

A. Data Collection

Airbnb is one of the major accommodation-sharing platforms known for using dynamic pricing tools to suggest prices to hosts. This study employs an Airbnb dataset downloaded from an open-source data platform, Kaggle. The dataset contains two csv files, train data and test data. The train dataset has 74,117 listings (rows) from 6 cities in US with detailed host information and daily supply, demand, pricing, and customer reviews data for 2009 to 2017. While the test data has 25,459 listings with the same data of the train set excluding the pricing column.

B. Data Pre-processing

The aim of data pre-processing is to reduce the complexity of large datasets to ensure consistency, validity, and accuracy. This process is essential to prepare data for analysis. It consists of several steps:

Data cleaning: removal of noisy data, handling missing values, resolving inconsistencies etc. For example, the Airbnb data contains columns irrelevant to the target variable such as “id” and “thumbnail_url” which will be removed. Missing values can be handled by deleting the features entirely, removing the rows with missing values, or replacing the cells with the missing values with zeros, means, maximums or minimums. The method of handling missing values will be determined by the model parameters.
Data integration: reconciling and merging derived data.
Data transformation: This involves normalising, generalising, and aggregating data.
Reduction: removal of redundant data.
Discretization: converting categorical features to discrete or nominal form.

C. Feature Selection and Model Application

a) Feature Selection

The overall process of selecting features is divided into four parts, generation procedure, evaluation function, stopping criterion and validation procedure (Dash & Liu, 1997). This study will use Boruta (Random Forest based) in the generation procedure to determine which feature is ‘Important’ or ‘Unimportant’. Then the ‘Important’ feature will be retained.

b) Model Selection

Random Forest

This model consists of separate decision trees which are developed based on randomly selected attributes using the bagging method. It is not concerned with scaling and overfitting due to its ability to converge the large forest of expanded trees. Random Forest is an ensemble method which has the advantage of precise prediction. The Airbnb dataset consists of both categorical and continuous variables. Therefore, Random forest will be used because it has the ability to handle various data types at the same time.

k-Nearest Neighbours Algorithm (k-NN)

This classifier model classifies the training set by measuring the Euclidean or Mahalanobis distance between different attributes. k-NN finds the k training set instance that the test sample is closest to and classifies them as belonging to one class. The predicted value may be estimated as the average value of k-NN or a distance weighted average.

Support Vector Regression

SVR aims to find a function that approximates the training points by minimising the prediction error. This study employed SVR as price prediction is a regression problem and one of its advantages is that it can handle high-dimensional data while preventing the effect of overfitting.

D. Experiment Environment

Big data - a scientific, economic, and social trend which has been under debate for years has been the increasingly pervasive production of massive, growing volumes of data. To order to extract useful insights from big data and are dedicated to disciplines like 'data science,' or 'analytics,' right statistical methods are required. The same relates to the methodologically challenging evaluation of limited amounts of data, e.g. in the "classical" academic area. The R programming language is a mathematical device, which in recent years has become ever more relevant. In comparison to Python, already a widely used data analysis tool, R is a specially built framework for mathematical applications. The essential functions include statistical analysis and visualization of results. R created as open-source software under GNU General Public License by Ross Ihaka and Robert Gentlemen, Auckland, and presumably whose name has its tag, is distributed via the Wiener Statistical Computing Framework as an open-source application. Since the R (development) core team (from whom the framework also came) is further improving the centre of R, the real strength of R lies in the fact that additional functions in the form of so-called packages are given. Professional developers globally sell services for a diverse variety of technologies, from classical regression to deep learning. The Comprehensive R Archive Network (CRAN) and many other hubs (including the Bioconductor Project's comprehensive package set bioinformatics), provide free access to over 12,000 of these feature packages which approximate contain more than 220,000 functions. While a specialized approach is not required for a statistical problem, however uncommon, there is still no acceptable solution in R ready to use so it can be tailored to your needs at any time, due to the open-source licence.

Python is a simple syntax and strong readability programming language. In those operating systems, it is found simple to understand and can be understood. The name derives from the term "Flying Circus" by Monty Python. Python follows many programming paradigms, such as modular, subject-oriented or aspect-based programming. At the beginning of the 1990s, Guido Van Rossum created Python at the Amsterdam Centrum of Wiskunde& IT. After half of 2018, there has been the latest edition 3.7 of the programming language. Under Python Software Foundation Licence, the Python source code is publicly available. There is a wide variety of users on the internet. For rising operating systems, Python is readily available. For several Linux distributions, the programming language is standard. In several computer operating systems Python may also be found. WSGI (Application Service Gateway Interface) is a standardized webserver interface for Python. Under the theory, Python does not require an evolving world, because Python does not need to compile code and text editors may compose scripts. You will play with language possibilities through immersive interpreters. Python may be tailored to rise editors used by programmers such as Emacs or Vim. IDLE and Python are also enabled.

Python is regarded as a simple, clean, and efficiently organized language. It is quick and straightforward to interpret the computer code simultaneously. Python has robust scalability given its simplicity and can be used for complicated program projects. With the concise, mini-syntax, programs with few code lines and fewer programming-sensitivity can be introduced. Python operates only very few keywords to maintain consistency and clarification and utilizes indentation as structural elements.In comparison to many other languages, separate blocks are defined by individual programming lines, rather than by keywords or brackets. Fast memory control is another essential function. The memory may not be explicitly specified and reserved for variables or arrays. Drastically, programming can reduce memory leak errors. Thanks to dynamic programming, forms of variables or method arguments are not available in Python 's programs. Python has no syntactic construction. There are, for instance, just two-loop shapes of "with" and "when." Before and after loops, like several other programming languages, there may be another part.

Scikit-learn is a system analysis research library. The software is accessible openly from GitHub under a BSD 3-clause license and is built for the Python language. Machine learning is feasible for various techniques such as clustering, regression or classification. In tandem with the computer libraries SciPy and NumPy, Scikit-learn is included. A powerful and well-documented purpose characterises the library. Scikit-learn is written primarily in Python. Cython has been used to obtain better efficiency for some key algorithms. Scikit-learn refers back to the Google Summer Code scikits. Learn initiative by David Cournapeau. The "Scikit" term comes from the "SciPy Toolkit" branch. Developers then rewrite the initial coding from the project. Scikit-learn's first release was published in 2010. Scikit-learn 's latest 2017 update is 0.19.1. In both controlled and non-conceived research, Scikit-learn algorithms are satisfactory. As the Machine Learning Framework is built on Python science resources which provides the programming language with a clear GUI, Scikit Training can be quickly implemented into applications. Different approaches and algorithms with various purposes may be viewed as building blocks. Scikit-learn machine-learning systems are used in research and the business world.

The principal aim is the advancement of science learning was to apply computer learning in a reliable way that concentrates on the core functions. The functions and parameters were clearly described, which are strictly in compliance with the Python and its science libraries programming language conventions. Scikit-learn will be accessible openly and is also authorised under BSD. Scikit-learn's continued growth contributes to an engaged culture. The sharp library description and its features and parameters are an essential aspect of Scikit-learn. Complete user manuals, class definitions, diagrams, installation guidance and other specific samples are given for developers and consumers. A tidy, standardised, and easy API highlights the database. Once you grasp the concepts of the use of Scikit-learning with a specific model, extending Scikit-learning to other models is relatively straightforward.

Scikit-learn is partially based on SciPy and NumPy mathematical Python modules and Cython. For the parameters of the system structures, the NumPy code structure is used. NumPy arrays view the info, enabling fast addition to other Python science libraries. For numerical equations, NumPy often offers simple functions. For linear algebra and matrix functions, SciPy provides practical algorithms. These are also mathematical features in the library. Cython allows C in Python to be used. Cython improves the computer output for many high-level operations. Scikit-learn concentrates on computer mining. Its library can be used for data clustering and grouping, dataset creation, image and text attribute creating, mixing machine learning approaches, setting up learning models, summarising multidimensional data etc. Training machines requires models focused on evidence. Data can be generated as tables for further analysis by algorithms in Scikit-learn. A plain table consists of rows and columns and is two-dimensional. The rows reflect various attributes, such as amounts or weights, of the individual elements and columns. These tables are focused on the use of functions and algorithms to construct machine learning models.

Data cleaning, pre-processing, visualisation, and statistical analysis will be done using R. This is because R has several data manipulation and statistical packages and libraries. Python will be used to handle machine learning analysis because it has richer machine learning packages and libraries (such as Scikit-learn) when compared to R.

E. Ethics

Today's software and algorithms for artificial intelligence are great for crawling big blocks of data, but computers for analysis and application of data will lead quickly to ethical problems. For example, consider the usage of AI for the control of scarce disaster capital in a community as an efficient application. It will often reassess goals throughout its service and likely reorient services, which provides the need for qualitative and multicultural decision-making, in addition to determining the most rapid response times and assessing the severity of events. The maximization of humans tends to be a vulnerable goal for AI. However, a computer may attempt to cheat by creating relatively low-risk crises to optimize their performance and ignore less likely to be effective. Deterioration of these objectives may contribute to the opposite of the paralysis, whereby the program continuously redirects resources to new cases of high priority but never tries to solve those of lower priority. AI should regard the seriousness or subtlety of each accident. Which measurement is used to calculate the difference between sending resources to a minor fire or an incident in the car? Must funds be switched from assistance to a small event nearby which has been waiting for some time to participate? An individual can hardly determine these things. What an AI will respond here could be even harder to prepare. AI protection aims to assess the unintentional consequences of laws – a description of practical incentives and goal schemes – and prevent shortcuts from AI. Everybody has ethical principles, and the way IP approaches specific scenarios would almost definitely represent the ethical standards that we are programming in them-albeit later on.

This is not just tragedy and dullness, but others assume that in certain less ideal human circumstances, we can still enforce ethical laws more coherently than we can in the universe by utilizing artificial awareness. Observers also underestimate autonomous weapons' ability not just because they are lethal but also because human connections to war's moral repercussions could be broken. Many citizens are obsessed with aggression and conflict, so free guns provide the rare chance for engineers to put down ethics guidelines for civil interaction and recovery, without caring about existing adrenaline shortages. However, with any imaginable battle scenario, it will be particularly challenging to locate firm guidelines. A realistic approach may be to utilize artificial learning such that AI makes its assumptions on ethics, by optimizing fundamental concepts and drawing familiar information and feedback on tests, and by adjusting it to unexpected scenarios. The outcomes are thus less consistent and rely on the form of starting laws.

Given that AI's decision-making, data modelling and ethics would ultimately be influenced by human interference; we must take caution not to combine our biases and unethical practices in machine learning and AI algorithms. Some common examples demonstrated potential variations in knowledge or more dubious machine learning applications. Algorithms underestimated the black prisoner risk of recurrence. The traditional perceptions of women are reinforced by studying images. For any inconsistencies, there are legitimate explanations in every data record. For some cases, class, ethnicity, nationality, marital status, residency, age, employment and much more, but only as part of multi-stakeholder analyses, may be relevant indicators. There is a possible difficulty in attempting algorithms to use especially arbitrary details for abbreviations — stressing assumptions or percentages at the detriment of more extensive variables — or in the inherently incorrect collection of data. Gender-based development in business, such as the higher proportion of women in college or men in technological science, may be detected by a selection of the most qualified work applicants. This point alone is not negative-the interests of workers should be considered. However, the algorithm could automatically refuse minority gender applications in the industry concerned if it put an excessive focus on the subject to optimize its recruitment efficiency. If you decide to recruit the brightest and the knowledgeable, that is not effective because it further enhances the prejudices. To prevent these issues, identify and evaluate ethical IA processes, the aims and ethics of an AI system should be explicitly defined.

However, if a program produces outcomes, we think are immoral, are we liable for the process or the people who developed it? In situations where an individual or a community creates a program with intentionally corrupt goals or even when the potential outcomes have gained inadequate consideration, liability is quickly passed to the writers. An algorithm is, after all, only as ethical as the knowledge and expectations it is supplied with. If the authors should be blamed on the culprit merely because we want the outcome is not always apparent. Data or program anomalies are not necessarily evident and cannot be detected automatically to reach something of an end of liability. Even if his machine learning method is focused on fundamental ethical principles, it may be challenging to control how an AI concludes. This becomes much more complicated to fault the writers for unintended effects as the AI becomes encouraged to follow their own ethical.

The other side of the coin should even be concerned with: How do humans cope with computers that only speak by themselves? There is considerable controversy about the features of the general artificial intelligence, not a convincing concept, as either a truly original perception or a human intellect. The most significant distinction between specific (or applied) and general AI is now accepted, but experts are still not sure whether "real" artificial intelligence can be described or checked. Completive interpretation of the situation, choices based on incomplete knowledge or the general capability of an algorithm to meet the objectives in several settings are suggested as parameters for evaluating AI intelligence. For those who want a simple distinction between intellect and a computer do not automatically fulfil. The other issue is that neuroscientists fail to isolate the characteristics of the brain correlated with human thought, understanding, and developing a real consciousness. Defining knowledge – not just for IA, but also for citizens – is probably one of the most significant issues unanswered of our day. The philosophy of AI covers a wide variety of subject matter and situations which will be appropriate by the way we implement technology to benefit which accountability engineers for the essence of how we measure and handle these kinds of knowledge. AI's ever rapid growth and usage of our everyday lives renders it an immediate need to deal with these problems. No underestimation to claim that it is incredibly difficult to study an AI ethos, its application, and its effects. It is feasible, but an incredibly detailed discussion and community building in society would be required.

The data used for this study is obtained from Kaggle, an open data source, therefore no approval is required before usage. The large dataset is fully anonymised, owners of the original data cannot be re-identified through the findings of this research.

IV. References

Austin, S., Canipe, C., & Sarah, S. (2015). The Billion Dollar Startup Club - WSJ.com. Wall Street Journal, 46, 1–5. Retrieved from https://www.wsj.com/graphics/billion-dollar-club/

Blal, I., Singal, M., & Templin, J. (2018). Airbnb’s effect on hotel sales growth. International Journal of Hospitality Management, 73, 85–92. https://doi.org/10.1016/j.ijhm.2018.02.006

Breidbach, C. F., & Maglio, P. P. (2016). Technology-enabled value co-creation: An empirical analysis of actors, resources, and practices. Industrial Marketing Management, 56, 73–85. https://doi.org/10.1016/j.indmarman.2016.03.011

Ciracì, F. (2013). Mitologie 2.0: Digital Platforms & Umbrella Terms. H-Ermes. Journal of Communication, 1(1), 109–126. https://doi.org/10.1285/i22840753v1n1p109

Coker, F. (2014). Pulse: Understanding the Vital Signs of Your Business. (A. (Inspira L. S. Lawrence & G. (WA) Harbor, Eds.) (1st ed.). Bellevue, WA: Ambient Light Publishing. Retrieved from https://books.google.co.uk/books?id=rd-DoAEACAAJ

Dufva, M., Koivisto, R., Ilmola-Sheppard, L., & Junno, S. (2017). Anticipating Alternative Futures for the Platform Economy. Technology Innovation Management Review, 7(9), 6–16. https://doi.org/10.22215/timreview/1102

Gibbs, C., Guttentag, D., Gretzel, U., Yao, L., & Morton, J. (2018). Use of dynamic pricing strategies by Airbnb hosts. International Journal of Contemporary Hospitality Management, 30(1), 2–20. https://doi.org/10.1108/IJCHM-09-2016-0540

Gupta, R., & Pathak, C. (2014). A Machine Learning Framework for Predicting Purchase by Online Customers based on Dynamic Pricing. Procedia Computer Science, 36, 599–605. https://doi.org/10.1016/j.procs.2014.09.060

Huang, Y.-S., Hsu, C.-S., & Ho, J.-W. (2014). International Journal of Production Research Dynamic pricing for fashion goods with partial backlogging Dynamic pricing for fashion goods with partial backlogging. https://doi.org/10.1080/00207543.2014.881576

Magno, F., Cassia, F., & Ugolini, M. M. (2018a). Accommodation prices on Airbnb: effects of host experience and market demand. The TQM Journal, 30(5), 608–620. https://doi.org/10.1108/TQM-12-2017-0164

Magno, F., Cassia, F., & Ugolini, M. M. (2018b). Accommodation prices on Airbnb: effects of host experience and market demand. The TQM Journal, 30(5), 608–620. https://doi.org/10.1108/TQM-12-2017-0164

Meng, F., Zeng, X.-J., Zhang, Y., Dent, C. J., & Gong, D. (2018). An integrated optimization + learning approach to optimal dynamic pricing for the retailer with multi-type customers in smart grids. Information Sciences, 448–449, 215–232. https://doi.org/10.1016/J.INS.2018.03.039

Murphy, K. P. (2012). Machine Learning: A Probabilistic Perspective (illustrate). Cambridge, Massachusetts: MIT Press. Retrieved from https://books.google.co.uk/books?id=RC43AgAAQBAJ

Narahari, Y., Raju, L., Ravikumar, K., & Shah, S. (2005). Dynamic pricing models for electronic business. Sadhana (Vol. 30). Bangalore, India. Retrieved from https://www.ias.ac.in/article/fulltext/sadh/030/02-03/0231-0256

Reinartz, W. (2002). Reinartz Werner, Customizing Prices in Online Markets, Symphonya. Emerging Issues in Management (www.unimib.it/symphonya, (1), 55–65. https://doi.org/10.4468/2002.1.05reinartz

Sedera, D., Lokuge, S., Grover, V., Sarker, S., & Sarker, S. (2016). Innovating with enterprise systems and digital platforms: A contingent resource-based theory view. Information and Management, 53(3), 366–379. https://doi.org/10.1016/j.im.2016.01.001

Steppe, R. (2017). Online price discrimination and personal data: A General Data Protection Regulation perspective. Computer Law & Security Review, 33(6), 768–785. https://doi.org/10.1016/J.CLSR.2017.05.008

Van Alstyne, M. W., Parker, G. G., & Choudary, S. P. (2016). Pipelines, platforms, and the new rules of strategy. Harvard Business Review, 94(April), 54–62. Retrieved from https://hbr.org/2016/04/pipelines-platforms-and-the-new-rules-of-strategy

Wang, D., & Nicolau, J. L. (2017). Price determinants of sharing economy based accommodation rental: A study of listings from 33 cities on Airbnb.com. International Journal of Hospitality Management, 62, 120–131. https://doi.org/10.1016/j.ijhm.2016.12.007

West, S., Gaiardelli, P., & Rapaccini, M. (2018). Exploring technology-driven service innovation in manufacturing firms through the lens of Service Dominant logic. IFAC-PapersOnLine, 51(11), 1317–1322. https://doi.org/10.1016/j.ifacol.2018.08.350

Ye, P., Wu, C. H., Qian, J., Zhou, Y., Chen, J., De Mars, S., … Zhang, L. (2018). Customized regression model for Airbnb dynamic pricing. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 932–940). New York, NY, USA: ACM. https://doi.org/10.1145/3219819.3219830