**2. Overview**

We first give some notations and definitions which match those from the survey paper written by Pan et al. [1], and these notations are also widely adopted in many other survey papers such as [2, 3].

Definition 1 (**Domain** [1]) Given a specific dataset *X* ¼ f g *X*1, … , *Xn* ∈ *X*, where *X* denotes the feature space, and a marginal probability distribution on the dataset *P X*ð Þ. A domain can be defined as *D* ¼ f g *X*, *P X*ð Þ . Therefore, a domain consists of two components: the feature space and the marginal probability distribution on the dataset.

Definition 2 (**Task** [1]) Given a specific dataset *X* ¼ f g *X*1, … , *Xn* ∈ *X* and their labels *Y* ¼ f g *Y*1, … , *Yn* ∈ *Y*, where *Y* denotes the label space. A task can be defined as *T* ¼ f g *Y*, *F*ð Þ *X* , where *F* is an objective predictive function to learn, which can be seen as a conditional distribution *P Y*ð Þ j*X* .

Definition 3 (**Transfer Learning** [1]) Given a source domain *D<sup>s</sup>* and its corresponding task *T <sup>s</sup>*, where the learned function *F<sup>s</sup>* can be interpreted as some knowledge obtained in *D<sup>s</sup>* and *T <sup>s</sup>*. Our goal is to get the target predictive function *F<sup>t</sup>* for target task *T <sup>t</sup>* with target domain *Dt*. Transfer learning aims to help improve the performance of *F<sup>t</sup>* by utilizing the knowledge *Fs*, where *D<sup>s</sup>* 6¼ *D<sup>t</sup>* or *T <sup>s</sup>* 6¼ *T <sup>t</sup>*.

In short, transfer learning can be simply denoted as

$$\mathcal{D}\_t, \mathcal{T}\_s \to \mathcal{D}\_t, \mathcal{T}\_t \tag{1}$$

divergence between *D<sup>s</sup>* and *D<sup>t</sup>* including distribution shift and different feature

Definition 4 (**Domain Adaptation**) Given a source domain *D<sup>s</sup>* for task *T <sup>s</sup>* and a target domain *D<sup>t</sup>* for task *T <sup>t</sup>*, where *D<sup>s</sup>* 6¼ *Dt*. Domain adaptation aims to learn a predictive function *F<sup>t</sup>* so that the knowledge obtained from *D<sup>s</sup>* and *T <sup>s</sup>* can be used

When *D<sup>s</sup>* 6¼ *D<sup>t</sup>* and *T <sup>s</sup>* 6¼ *T <sup>t</sup>*, transfer learning should be conducted carefully. If the data in source domain *D<sup>s</sup>* is very different from that in target domain *Dt*, bruteforce transfer may hurt the performance of predictive function *Ft*, not to mention the scenario when source task *T <sup>s</sup>* and target task *T <sup>t</sup>* are also different. From a literature review of deep learning, we notice that there is little research in this

In summary, the above definitions give us the answer to what to transfer, and the four scenarios demonstrate the research issue of when to transfer. As shown in **Figure 2**, in contrast to the categorization of transfer learning that is introduced in the survey paper [1], our discussions mainly focus on transfer learning in deep neural networks. In the following sections, we pay our attention to how to transfer. Specifically, we will

spaces. Formally, the definition of domain adaptation can be defined as.

*Categorization of transfer learning based on labels. (The image is from Pan [1]).*

*Hierarchically-structured taxonomy of transfer learning in this survey.*

*Transfer Learning and Deep Domain Adaptation DOI: http://dx.doi.org/10.5772/intechopen.94072*

for enhancing *Ft*. In other words, the domain divergence is adapted in *Ft*.

introduce and summarize three main methods for deep domain adaptation.

scenario and it is still an open question.

**Figure 1.**

**Figure 2.**

**47**

Transfer learning is a very broad research subject in machine learning. In this research, we mainly focus on transfer learning based on deep neural networks (i.e., deep learning). Therefore, as shown in **Figure 1**, based on *D<sup>s</sup>* ¼6 *D<sup>t</sup>* or *T <sup>s</sup>* ¼6 *T <sup>t</sup>*, we can have three scenarios when applying transfer learning. Note that when *D<sup>s</sup>* ¼ *D<sup>t</sup>* and *T <sup>s</sup>* ¼ *T <sup>t</sup>*, the problem becomes a traditional deep learning task. In such case, a dataset is usually divided into a training dataset *D<sup>s</sup>* and a test training dataset *Dt*, then we can train a neural network *F* on *D<sup>s</sup>* and apply the pre-trained model *F* to *Dt*.

When *D<sup>s</sup>* ¼ *D<sup>t</sup>* and *T <sup>s</sup>* 6¼ *T <sup>t</sup>*, transfer learning is usually transformed into a multi-task learning problem. Since the source domain and the target domain share the same feature space, we can utilize one giant neural network to solve different types of tasks at the same time. For example, multi-task learning is widely used in the autopilot system. Given an input image, we can utilize a deep neural network that has enough capacity to recognize the cars, the pedestrians, traffic signs, and the locations of these objectives in the image.

When *D<sup>s</sup>* 6¼ *D<sup>t</sup>* and *T <sup>s</sup>* ¼ *T <sup>t</sup>*, deep domain adaptation technique is usually used to transfer the knowledge from the source to the target. In general, the goal of domain adaptation is to learn a mapping function *F* to reduce the domain

*Transfer Learning and Deep Domain Adaptation DOI: http://dx.doi.org/10.5772/intechopen.94072*

#### **Figure 1.**

and target domain and utilize the stored knowledge to improve the performance on the target task. Note that transfer learning is an extensive research topic that involves many learning methods. In particular, deep domain adaptation gets the most attention in recent years among these methods. Therefore, after briefly introducing the transfer learning in this research, we pay our attention to analyzing and

The rest of this chapter is structured as follows. In Section 2, we give an overview and specific definitions of transfer learning. In Section 3, we summarize the main approaches for deep domain adaptation. In Section 4, 5 and 6, we discuss the details for conducting deep domain adaptation. The recent applications based on deep domain adaptation methods are also introduced in Section 7. Finally, we

We first give some notations and definitions which match those from the survey paper written by Pan et al. [1], and these notations are also widely adopted in many

Definition 1 (**Domain** [1]) Given a specific dataset *X* ¼ f g *X*1, … , *Xn* ∈ *X*, where *X* denotes the feature space, and a marginal probability distribution on the dataset *P X*ð Þ. A domain can be defined as *D* ¼ f g *X*, *P X*ð Þ . Therefore, a domain consists of two components: the feature space and the marginal probability distribution on the dataset. Definition 2 (**Task** [1]) Given a specific dataset *X* ¼ f g *X*1, … , *Xn* ∈ *X* and their labels *Y* ¼ f g *Y*1, … , *Yn* ∈ *Y*, where *Y* denotes the label space. A task can be defined as *T* ¼ f g *Y*, *F*ð Þ *X* , where *F* is an objective predictive function to learn, which can

Definition 3 (**Transfer Learning** [1]) Given a source domain *D<sup>s</sup>* and its corresponding task *T <sup>s</sup>*, where the learned function *F<sup>s</sup>* can be interpreted as some knowledge obtained in *D<sup>s</sup>* and *T <sup>s</sup>*. Our goal is to get the target predictive function *F<sup>t</sup>* for target task *T <sup>t</sup>* with target domain *Dt*. Transfer learning aims to help improve the performance of *F<sup>t</sup>* by utilizing the knowledge *Fs*, where *D<sup>s</sup>* 6¼ *D<sup>t</sup>* or *T <sup>s</sup>* 6¼ *T <sup>t</sup>*.

Transfer learning is a very broad research subject in machine learning. In this research, we mainly focus on transfer learning based on deep neural networks (i.e., deep learning). Therefore, as shown in **Figure 1**, based on *D<sup>s</sup>* ¼6 *D<sup>t</sup>* or *T <sup>s</sup>* ¼6 *T <sup>t</sup>*, we can have three scenarios when applying transfer learning. Note that when *D<sup>s</sup>* ¼ *D<sup>t</sup>* and *T <sup>s</sup>* ¼ *T <sup>t</sup>*, the problem becomes a traditional deep learning task. In such case, a dataset is usually divided into a training dataset *D<sup>s</sup>* and a test training dataset *Dt*, then we can train a neural network *F* on *D<sup>s</sup>* and apply the pre-trained model *F* to *Dt*. When *D<sup>s</sup>* ¼ *D<sup>t</sup>* and *T <sup>s</sup>* 6¼ *T <sup>t</sup>*, transfer learning is usually transformed into a multi-task learning problem. Since the source domain and the target domain share the same feature space, we can utilize one giant neural network to solve different types of tasks at the same time. For example, multi-task learning is widely used in the autopilot system. Given an input image, we can utilize a deep neural network that has enough capacity to recognize the cars, the pedestrians, traffic signs, and the

When *D<sup>s</sup>* 6¼ *D<sup>t</sup>* and *T <sup>s</sup>* ¼ *T <sup>t</sup>*, deep domain adaptation technique is usually used

to transfer the knowledge from the source to the target. In general, the goal of domain adaptation is to learn a mapping function *F* to reduce the domain

*Ds*, *T <sup>s</sup>* ! *Dt*, *T <sup>t</sup>* (1)

discussing the recent advances in deep domain adaptation.

conclude this research and discuss future trends in Section 8.

**2. Overview**

other survey papers such as [2, 3].

*Advances and Applications in Deep Learning*

be seen as a conditional distribution *P Y*ð Þ j*X* .

locations of these objectives in the image.

**46**

In short, transfer learning can be simply denoted as

*Hierarchically-structured taxonomy of transfer learning in this survey.*

#### **Figure 2.**

*Categorization of transfer learning based on labels. (The image is from Pan [1]).*

divergence between *D<sup>s</sup>* and *D<sup>t</sup>* including distribution shift and different feature spaces. Formally, the definition of domain adaptation can be defined as.

Definition 4 (**Domain Adaptation**) Given a source domain *D<sup>s</sup>* for task *T <sup>s</sup>* and a target domain *D<sup>t</sup>* for task *T <sup>t</sup>*, where *D<sup>s</sup>* 6¼ *Dt*. Domain adaptation aims to learn a predictive function *F<sup>t</sup>* so that the knowledge obtained from *D<sup>s</sup>* and *T <sup>s</sup>* can be used for enhancing *Ft*. In other words, the domain divergence is adapted in *Ft*.

When *D<sup>s</sup>* 6¼ *D<sup>t</sup>* and *T <sup>s</sup>* 6¼ *T <sup>t</sup>*, transfer learning should be conducted carefully. If the data in source domain *D<sup>s</sup>* is very different from that in target domain *Dt*, bruteforce transfer may hurt the performance of predictive function *Ft*, not to mention the scenario when source task *T <sup>s</sup>* and target task *T <sup>t</sup>* are also different. From a literature review of deep learning, we notice that there is little research in this scenario and it is still an open question.

In summary, the above definitions give us the answer to what to transfer, and the four scenarios demonstrate the research issue of when to transfer. As shown in **Figure 2**, in contrast to the categorization of transfer learning that is introduced in the survey paper [1], our discussions mainly focus on transfer learning in deep neural networks. In the following sections, we pay our attention to how to transfer. Specifically, we will introduce and summarize three main methods for deep domain adaptation.
