**2. Literature review of subject area**

## **2.1 Customer value analysis**

Drucker once pointed out that when our customers purchase products or services, it is not because of the product or consumption itself but the value brought by the product [3]. Customer value is based on the trade-off between the customer's perceived gain and perceived loss or the customer's comprehensive evaluation of the product's utility. Ravald proposed that customer value should focus on the entire relationship continuance process [4]. Butz and Goodstein also emphasized that customer value includes the added value customers receive after purchasing and using a product, which can help build stronger connections between customers and suppliers [5]. Woodruff found that customers' additional value comes from the perceptions, preferences, and evaluations that customers get after using a product or experiencing a service [6]. With the advent of the Internet and the significant data era, emerging data mining technology in customer value analysis has brought customer value analysis into a new era.

As a classic model of data mining, the RFM model is an essential tool and means to measure customer value and the ability of customers to create benefits [7]. These three indicators describe the value of the customer. Based on these three factors, a customer score is calculated based on the customer's purchasing behavior [8].

#### *A Federated Learning-Based Civil Aviation Passenger Value Analysis Method and MaaS… DOI: http://dx.doi.org/10.5772/intechopen.107115*

This model achieves the purpose of direct marketing by distinguishing different types of customers according to their purchase behavior. Because the interpretability of the RFM model is very good, it is a widely used customer value analysis model. However, the RFM model has few variables and cannot capture the variables of the specific personalized behavior of customers, which has certain disadvantages, and machine learning can overcome this defect. More specific variables can be included in the model in machine learning models. For example, specific variables such as customers' consumption habits and payment preferences can reflect the customer's consumption attitude. For customers accustomed to excessive consumption, the company can adopt active marketing methods to sell some fashionable products beyond their current economic level to customers. Companies can make marketing policies economical to maintain customers from a longer-term perspective for customers with conservative consumption attitudes. The customer lifetime value model using machine learning uses multiple specific variables to comprehensively evaluate customer value, taking into account the sum of the net present value of all current and future monetary benefits that customers create for the business [9]. Researchers have different opinions on the definition of CLV. Robert Dwyer believes that customer lifetime value is the sum of discounted benefits customers create for the enterprise during the active period [10]. Gupta and Lehmann consider the customer lifetime value to be the discounted value of the customer's total expected future profit [11]. Sharad Borle and other scholars believe that customer lifetime value is the present value of all profits customers bring to the business [12]. It can be seen that scholars are divided on the time frame of lifetime value. In the existing CLV models, most focus on the cash flow of customers and lack of consideration of variables related to customer consumption behavior [13]. Therefore, it is crucial to incorporate broader multi-source data variables (considering variables related to customer consumption history in the time dimension) to predict CLV.

Chen first proposed to use the xgboost model to predict the passenger value of China Eastern Airlines in their research. They took the lead in combining the airline's internal customer primary data with TravelSky's external data, enriching the labels of passenger consumption behavior, and improving prediction accuracy. Similarly, Yang et al. proposed an ER framework to deal with classification imprecision from the perspective of uncertain information fusion [14]. Yang & Singh et al. incorporated ER into the Dempster combination rule and proposed a recursive algorithm [15]. After that, Yang and Xu solved the decision-making problem dealing with multiple data forms with a general decision-making model based on rules and utility-based information transformation methods [16]. Through multiobjective reasoning, ER has been applied in system prediction [17], automobile research and development, nuclear power plant site selection, inventory management [18], performance evaluation [19]. However, both XGBOOST and ER models can realize multi-source data modeling when the data are known or partially known. In reality, the data of different enterprises are stored and maintained independently of each other, and it is not easy to share. Moreover, there is also the risk of data leakage, so technology is needed to solve the problem of data sharing in multi-source data fusion modeling and ensure data security.

#### **2.2 The federated learning**

Nowadays, data privacy and security protection by law is becoming more and more strict. Whether it is the General Data Protection Regulations promulgated in 2016 [20] or the Cybersecurity Law of the People's Republic of China introduced in 2017 [21], both confirm the global trend of data privacy protection and reflect difficulty in integrating data from different industries. At the same time, in data fusion, the privacy of data has a high risk of leakage. For example, the original data transmitted are quickly attacked, and there is a possibility of leakage at the data level [22]. Therefore, safely and legally integrating data in various fields is a breakthrough in big data. The key to the bottleneck is the core of insight into passenger value and accurate service.

Federated machine learning was first proposed by Google in 2016 [23], mainly to train models for Gboard (an input board created by Google) by training the model on each terminal that uses Gboard and then aggregating the encryption of each model. Gradient, to generate a federated model with better training effect, instead of collecting all the data from the terminal to the cloud and then unifying the training model, this operation significantly saves computer computing power, releases the pressure of cloud computing, and becomes a solution to data security [24]. The effectiveness of big data models cannot be a good solution for the problem. Under the framework of federated learning, the enterprise realizes that the exchange of gradient and loss under the encryption mechanism, that is, the exchange of model parameters without the physical exchange of data, is realized without violating the data privacy regulations, a virtual shared model is established to realize that the data do not move, do not leak privacy and affect data compliance, improve the accuracy of the model, and optimize the performance of the model [24].

Federated learning is divided into horizontal federated learning, vertical federated learning, and federated transfer learning. Horizontal federated learning divides the dataset according to the horizontal or user dimension when the user features of the two datasets overlap more and the users overlap more diminutive, and take out the part of the data with the same user characteristics but not the same users. For example, there are two banks in different regions, their user groups are from their respective regions, and the intersection of each other is tiny. However, their business is very similar, so the recorded user characteristics are the same, and horizontal federated learning can build a joint model to increase the number of samples.

Contrary to horizontal federated learning, if the users of the two datasets overlap more and the user features overlap less, the dataset is divided vertically (i.e., feature dimension). The two users are the same, but the user features are incomplete. The same part of the data is used for training. For example, there are two different institutions, one is a bank in a particular place, and the other is an e-commerce company in the same place. Their user groups are likely to include most of the place's residents, so the intersection of users is large. However, since banks record the user's income and expenditure behavior and credit rating, while e-commerce keeps the user's browsing and purchase history, the intersection of their user characteristics is small. Vertical federated learning is federated learning that aggregates these different features in an encrypted state to increase feature dimensions to enhance model capabilities. It is also a critical technology that will be used in this project to integrate operator and airline data.

#### **3. Background to study population**

This study is based on the federated learning architecture, under the premise of ensuring the privacy and security of passenger information and integrating the operator's multi-source big data to enrich the airline's passenger characteristics dimensions, *A Federated Learning-Based Civil Aviation Passenger Value Analysis Method and MaaS… DOI: http://dx.doi.org/10.5772/intechopen.107115*

to evaluate the lifetime value of passengers, and to identify passengers with different values accurately. The research data are derived from historical data and authorization data in enterprise APPS of operators and airlines, extracted in a compliant and legal environment, and analyzed on private cloud. In this study, "operator" refers to China Telecom, and "airline" refers to China Southern Airlines.

#### **3.1 China southern airlines**

China Southern Airlines is the airline with the most significant number of transport aircraft, the most developed route network, and China's enormous annual passenger volume. It has eight holding public air transport subsidiaries in Xiamen, Henan, Guizhou, and Zhuhai and 20 branches in Xinjiang, Beibei and Beijing, with 23 domestic sales offices in Hangzhou, Qingdao, and other places and 54 overseas sales offices in Singapore, New York, Paris, and others. In 2019 and 2020, the passenger traffic volume was 152 million and 97 million, respectively, ranking first among Chinese airlines for 42 consecutive years. The annual passenger transport volume ranks first in Asia and the second in the world, and the cargo and mail transport volume ranks among the top 10 in the world (Data source: IATA). As of December 2020, China Southern has operated more than 860 passengers and cargo transport aircraft, including Boeing 787, 777, 737 series, Airbus A380, A330, A320 series, and is the first airline in the world to operate Airbus A380.

#### **3.2 China telecom**

China Telecom is a super-large communications operation company in China. It has been selected as one of the Fortune Global 500 Companies for many consecutive years. It mainly engages in comprehensive information services such as mobile communications, Internet access and applications, fixed telephone, satellite communications, and ICT integration. China Telecom has total assets of 907.8 billion yuan and 400,000 employees. It is a central enterprise funded by the state alone.

#### **4. Methodological chapter**

With the development of social science research, research design plays an increasingly important role in the research process. A rigorous study design can help ensure that the information obtained enables researchers to effectively and accurately understand the research question. This study was designed using quantitative methods.

#### **4.1 Research design**

Scientific research includes single-method research, mixed-method research, and multi-method research. The difference between the three types of research is the use of qualitative and quantitative research methods. Single-method studies use only a single qualitative or quantitative method. Mixed methods research combines qualitative and quantitative methods. Multiple methods prefer two quantitative methods or two qualitative methods.

Qualitative methods aim to collect and analyze more explicit information, such as participants' performance and written or oral expressions in interviews [25]. Corbin and Anselm [26] propose a qualitative approach that investigates real-world problems,



#### **Table 1.**

*Airline data dimension.*

participants say how they feel in their context, and researchers obtain data from reality. Quantitative methods analyze linkages to quantities in non-value scenarios [27]. The senior researchers of Xinli Market Research (DMB Research) believe that quantitative research is a research method and process that expresses problems and phenomena in quantity and obtains meaning through analysis, testing, and interpretation [28]. This study adopts quantitative methods, uses mathematical tools to analyze things quantitatively, and uses federated transfer learning tools to integrate airline data and operator data for modeling, considering passengers' travel ability, willingness, stability, physical space, and bio space security (Normalized epidemic situation), social network and other dimensions, to evaluate the value of passengers more comprehensively [28].

#### **4.2 Data collection**

Data collection consists of dataset A: data of airlines (China Southern Airlines) and dataset B: data of operators (China Telecom). Data A comes from China Southern Airlines and consists of 10,000 data instances, each of which has 40 attributes, including data on ticket purchase behavior, travel experience, and passenger membership attributes. The dimensions of information about passengers are shown in **Table 1**.

It is impossible to predict passengers' travel willingness, stability, movement trajectory safety of physical space and biological space under the epidemic situation, social network, only by relying on their data, which reduces the accuracy of passenger value evaluation. The company can only blindly provide the same service to all passengers, which significantly increases unnecessary costs, and the promotion and transaction rates are shallow. Therefore, it is necessary to integrate the operator dataset B to enrich the data dimension of airline passengers.

Dataset B comes from China Telecom and consists of 10,000 data instances. The ID is the same as data A, but it has 33 different attributes, including user Internet

access, consumption, preferences, travel OD trajectories, social network information, and other label data. The dimensions of passenger information that can be extracted by China Telecom are shown in **Table 2**.

The user's travel trajectory can be constructed through the integration of operator data. The travel mode preference, consumption ability, and online behavior (social network) can be determined to refine the passenger label further, gain insight into the value of passengers, and assist the precision marketing and decision-making of the aviation industry. Support and improve airline profit margins.

## **4.3 Data analysis**

Under the framework of federated learning, the technology of vertical federated learning can be realized, that is, the data of airlines and operators can be safely and legally integrated, a federated longitudinal logistic regression model can be established to predict the lifetime value of passengers, and a joint K- The Means model, which accurately divides the passenger group. Based on the federated learning framework, the specific steps to build logistic regression and K-Means model for longitudinal federated learning by integrating airline and operator data are as follows:

	- A. Distribute public keys to ensure data security
	- B. Using homomorphic encryption, RSA and Hash multilayer encryption of entity data
	- C. Perform entity data collision through Hash to find the intersection of the two data sets. According to the same sample (passenger ID), align the two data sets to ensure that the formats of the two datasets are the same.

After the dataset format is aligned, based on the vertical federated learning framework, an intermediate party is created to help both parties build a linear regression federated model to avoid data leakage (**Figure 1**).


*A Federated Learning-Based Civil Aviation Passenger Value Analysis Method and MaaS… DOI: http://dx.doi.org/10.5772/intechopen.107115*


#### **Table 2.**

*China telecom data dimension.*

#### **Figure 1.** *Joint modeling.*

3.Effect incentive

Another prominent feature of federated learning is that the effects after modeling will be reflected in practical applications and recorded in the.

In terms of permanent data recording mechanisms, such as blockchain, data providers—airlines and operators—will see the effects of the model promptly and reflect the contributions of both parties to their institutions and others.

### **5. Results chapters**

#### **5.1 The results from the modeling**

Federated learning modeling is similar to traditional machine learning modeling, where the quality of the data and variables determines the prediction outcome more than the algorithm. The IV values of the variables in dataset A are mainly in the range of 0.4–0.9, while the IV values of the variables in dataset B are mainly in the range of 0.4–1.3. The characteristic validity of the host is relatively more substantial than that of the guest. By adding the host variable, the performance of the entire federated model is significantly improved, and the AUC value is increased from 0.757 to 0.823 for the unilateral model, with an improvement rate of 8.7%. It can be seen that the empirical evidence that adding more highquality variables significantly improves the fitting and predictive power of the model also applies to the federated machine learning case. The essence of federated learning is to solve the problem of how to make full use of the advantages of big data while ensuring data security. This has no adverse effect on the power and performance of the model. Therefore, federated machine learning can be regarded as a safe, efficient, and guaranteed machine learning method in the era of big data (**Figures 2**-**5**).

The accuracy of the federated model increases from 0.837 to 0.847, an improvement of 1.2 percentage points. The federated model improves recall by 0.21% and accuracy by 1.7%.

#### **5.2 The results from the data**

This experiment shows that the model is more accurate after integrating external data, and it can be reversed that it is not comprehensive to rely solely on airline data to evaluate passenger value. Due to the particularity of aviation products, the services and products of various airlines are highly homogeneous at present, which cannot meet the personalized experience of different users. In order to provide targeted, personalized services, it is necessary to accurately gain insight into passengers' preferences, interests, influence on others, travel intentions, and other details. The operator's data can supplement the passenger label to analyze the travel trajectory and network behavior of passengers to describe the passenger behavior profile more comprehensively and accurately. Help airlines gain insight into the needs of passengers before and after the flight and the experience after the flight and launch a variety of differentiated services in a targeted manner to expand the scope and dimension of services.

**Figure 2.** *AUC comparison.*

#### *A Federated Learning-Based Civil Aviation Passenger Value Analysis Method and MaaS… DOI: http://dx.doi.org/10.5772/intechopen.107115*

**Figure 3.** *Accuracy comparison.*

**Figure 4.** *Precision comparison.*

Through this research, we found that in addition to the airline's internal factors such as the consumption amount, class, destination, and ticket purchase method during the flight, the following external factors can more comprehensively evaluate a passenger's value to the company:

#### 1.Travel stability

A traveler's fixed travel characteristics are in residence, work (school), and place of life and travel.

#### 2.Willingness to travel

The proportion a traveler spends on the airline's flight among all his boarding situations represents the willingness to travel with the airline.

#### 3.Social network influence

How many fans a traveler has on his Weibo, WeChat, Twitter, Instagram, and other social media and the influence of his words on fans.

4.Security of physical and biological spaces

In the normalized epidemic environment, the travel trajectory of a traveler and whether he is in close contact with risk groups.

#### **6. Conclusion and implications for policy and/or further research**

In reality, data silos, privacy protection, and data security are urgent issues to be solved. This paper proposes a federated learning-based model for passenger value research in the civil aviation industry while ensuring data security. To precisely analyze the value of airline passengers, a unilateral model using airline data and a joint model combining airline internal data and operators through federated learning are compared. It is concluded that the federated learning-based model solves the problem of data silos and dramatically improves the model's results, thereby better protecting user privacy and institutional data security. The model results provide airlines with technologies and methods to more accurately identify high-value passengers, help airlines understand passengers more comprehensively, provide differentiated services for passengers, and improve passengers' travel happiness.

With the improvement of people's living standards, the frequency of urban residents' travel has gradually increased, and citizens' travel has become more flexible and diverse. In order to solve the problems of vehicle reservation, route planning, and travel payment in the whole process of citizens' travel, the establishment of a MaaS platform that can integrate multiple travel methods and realize past paid travel services is considered by many scholars to achieve the above goals and is the key to promoting the sustainable development of the transportation industry in the post-epidemic era. There are already some practitioners abroad. Whim has launched a monthly travel package. By paying the monthly rental fee, users can enjoy multiple trips within a month. If the user runs out of services within the scope of the package this month, additional charges will be incurred; on the contrary, if the monthly rental is not used up at the end of the month, the remaining charges will be accumulated for the next month. Ubigo also launched a monthly rental package service. Users only need to pay a monthly rent to enjoy various travel services. For example, users can use public transportation for free in four designated areas and have 10 km of free time-sharing rental, long-term and short-term rental. Mileage, enjoy the privilege of free 30 minutes before sharing bicycles, and discount coupons for online taxi booking. NaviGoGo is mainly aimed at people aged

#### *A Federated Learning-Based Civil Aviation Passenger Value Analysis Method and MaaS… DOI: http://dx.doi.org/10.5772/intechopen.107115*

16–25. It uses the taxi splitter function to realize the sharing of carpooling costs and uses the deal matcher function to customize travel according to user preferences.

In contrast, China's MaaS construction is in its infancy. The more well-known company is Didi Chuxing, which has integrated taxis, green orange rides, and busses, but no intercity transportation, such as trains and planes. AutoNavi Maps relies on Alibaba's ecological resources to build a one-stop intra-city mobile travel and payment platform, including public transportation, trains, and planes. At the same time, AutoNavi Maps is also in urban travel, connecting more than 17 travel service providers and creating the most extensive aggregated taxi-hailing model. However, AutoNavi Maps cannot realize one-stop online payment. Relying on Alibaba's ecological resources, Alipay has integrated busses, subways, 12,306, online car-hailing, taxis, and bicycles, and deepening local life services. However, the bus and subway only have payment and scan code entrances, there is no route planning, and there are only motor trains and no planes for intercity. To sum up, there is still more room for development in China's MaaS platform construction.

China's new crown pneumonia epidemic has passed the most critical moment, entering the post-epidemic era and entering the industry recovery period. The COVID-19 pandemic has had a profound impact on people's daily travel. With the rapid rise of the home office, intelligent logistics, and zero-contact distribution, implementation of a series of epidemic prevention measures such as current limit control, closed management, and social contact restrictions, residents' travel frequency has decreased, and the transportation travel market has shrunk significantly. The transportation industry has had a strong impact.

Cost and risk are the primary measurement factors for travelers in the post-epidemic era. The digital divide encountered by the elderly in travel deserves attention. For example, in news reports, "the elderly were refused a ride because they did not have a health code" and "the subway cannot be taken without a smartphone." Wait. In the post-pandemic era, improving the fairness and inclusion of mobility is critical.

In future research, this study can provide a reference for the construction of MaaS platforms at home and abroad. Our research team AMY is trying to further segment passengers based on the insights of passenger value evaluation in this research. For example, we can divide passengers into three groups: young people, high-end passengers, and older adults, and combine the insights on their behavior preferences to launch suitable for them. Products and services such as routes mean transportation and payment methods improve transportation efficiency and enjoy the happiness brought by the convenience of transportation.

#### **6.1 Design a service blueprint for the characteristics of segmented groups**

Considering the characteristics of young people, high-end travelers, and the elderly, we design specific products and functional details to build a new MaaS service model. The system meets the following service functions (see **Figure 6**).

Preparation page: comprehensive travel information, including weather, air quality, travel information (distance, time, cost), available transportation methods, station facilities, and other information.

Plan a journey page: multimodal transport. The comprehensive information system will arrange and coordinate the transportation such as busses, subways, shared bicycles, and online taxis needed in the travel process at one time.

Take the transportation page: cross-platform service will access multiple shared bicycles and an online car rental platform that allows customers to use different thirdparty transportation services in one stop.

#### **Figure 6.** *Service functions.*

Payment page: allows customers to pay for different transportation combination services at one time.

Transfer page: track routes and locations in time and set transfers reminder.

Arrive at the destination page: access third-party operational data, provide nearby bicycle locations, and provide arrival payment function.

Evaluation and feedback page: Allow customers to make suggestions and feedback evaluations, analyze and operate in the background of business sharing.
