*3.1.4 Hybrid approaches*

In general, the core idea of deep domain adaptation is to learn indiscriminating internal representations from the source domain and target domain with deep neural networks. Therefore, we can combine different kinds of approaches discussed above to enhance the overall performance. For example, in [11], they adopt both the encoder-decoder reconstruction method and the statistic criterion method.

#### **3.2 Categorization based on learning methods**

Based on whether there are labels in the target domain datasets, we can further divide the above approaches into supervised learning and unsupervised learning. Note that the unsupervised learning methods can be generalized and applied to semi-supervised cases, therefore, we mainly discuss these two methods in this research. **Table 1** shows the categorization of deep domain adaptation based on whether the labels are needed in the target domain. A similar categorization is also introduced in [12].

#### **3.3 Categorization based on data space**

In some survey papers, the domain adaptation methods can also be categorized into two main methods based on the similarity of data space. (1) Homogeneous domain adaptation represents that the source data space and the target data space is the same (i.e., *X<sup>s</sup>* ¼ *Xt*). E.g., the source dataset consists of some images of cars from open public datasets, and the images of cars in the target dataset are manually collected from the real world. (2) Heterogeneous domain adaptation represents that the datasets are from different data space (i.e., *X<sup>s</sup>* 6¼ *Xt*). E.g., text vs. images. **Figure 3** presents the topology that is introduced in [12].


deep domain adaptation is proposed. Another problem is that what if there are no labels in the target dataset. Therefore, an unsupervised learning method must be

From the definition of domain adaptation, we see that the fundamental goal is to reduce the domain divergence between the source domain and target domain so that the function *F<sup>t</sup>* can achieve good performance on the target domain. Therefore, it's important and valuable to use a criterion to measure the divergence between different domains. In other words, we need to have a measurement of the

Maximum Mean Discrepancy (MMD) [15] is a well-known criterion that is widely adopted in deep domain adaptation such as [16, 17]. Specifically, MMD computes the mean squared difference between the two datasets, which can be

> � � � � �

*nm* X*n i*¼1

method *k*ðÞ to make MMD be computed easily (i.e., Gaussian kernel).

� � � �

types of regularizers are introduced: *L*<sup>2</sup> norm or in an exponential form.

� � �

*rw θ<sup>s</sup> j* , *θt j* � � <sup>¼</sup> *<sup>a</sup> <sup>j</sup>θ<sup>s</sup>*

*or* <sup>¼</sup> *exp a <sup>j</sup>θ<sup>s</sup>*

*Pr xs*�*D<sup>s</sup>*

where *h*∈*H* is a binary classifier (i.e., hypothesis). For example, in [19], domain-adversarial networks are proposed based on this statistic criterion (note that this method can belong to the approach of domain-adversarial learning).

Note that for fine-tuning networks with the label criterion or the statistic criterion, the weights in the networks are usually shared between the source domain and target domain. In contrast to these methods, some researchers argue that the weights for each domain should be related but not shared. Based on this idea, the authors in [20] propose a two-stream architecture with a weight regularization method. Two

> � � �

*<sup>j</sup>* <sup>þ</sup> *<sup>b</sup> <sup>j</sup>* � *<sup>θ</sup><sup>t</sup>*

� � � <sup>1</sup>

*h* ∈*H*

2

X*m j*¼1 *ϕ xs i* � �<sup>T</sup> *ϕ xt i* � � <sup>þ</sup>

X*m j*¼1 *k x<sup>s</sup> i* , *x<sup>t</sup> j* � � <sup>þ</sup>

where *ϕ*ðÞ denotes the feature space map. In practice, we can use the kernel

*H* -divergence [18] is a more general theory to measure the domain divergence,

½ �� *h x*ð Þ¼ *<sup>s</sup>* 1 *Pr*

*<sup>j</sup>* <sup>þ</sup> *<sup>b</sup> <sup>j</sup>* � *<sup>θ</sup><sup>t</sup>*

*j*

� � � *j*

� � �

1 *m*<sup>2</sup> X*m i*¼1

X*m j*¼1 *k x<sup>t</sup> i* , *x<sup>t</sup> j* � �

½ � *h x*ð Þ¼ *<sup>t</sup>* 1

� � � �

1 *m*<sup>2</sup> X*m i*¼1

*xt*�*D<sup>t</sup>*

X*m j*¼1 *ϕ x<sup>t</sup> i* � �<sup>T</sup> *ϕ x<sup>t</sup> j* � �

(4)

(5)

(6)

applied to the target dataset for domain confusion.

*Transfer Learning and Deep Domain Adaptation DOI: http://dx.doi.org/10.5772/intechopen.94072*

difference of probability distributions from different datasets.

**4.2 Statistic criterion**

defined as

*DMMD X<sup>s</sup>*

, *<sup>X</sup><sup>t</sup>* ð Þ¼ <sup>1</sup>

*n* X*n i*¼1

� � � � �

¼ 1 *n*2 X*n i*¼1

¼ 1 *n*2 X*n i*¼1

**4.3 Parameter regularization**

which is defined as

**51**

*ϕ xs i* � � � <sup>1</sup> *m* X*m j*¼1 *ϕ xt i* � �

X*n j*¼1 *ϕ xs i* � �<sup>T</sup> *ϕ x<sup>s</sup> j* � � � <sup>2</sup> *nm* X*n i*¼1

X*n j*¼1 *k x<sup>s</sup> i* , *xs j* � � � <sup>2</sup>

*d<sup>H</sup> D<sup>s</sup>* ð Þ¼ , *D<sup>t</sup>* 2 *sup*

#### **Table 1.**

*Categorization of deep domain adaptation based on whether the labels in the target domain are available.*

**Figure 3.**

*Categorization of domain adaptation based on feature space. (The image is from Wang [12]).*
