**6.1 Encoder-decoder-based approaches**

To reconstruct the samples, the basic method is that we can adopt an autoencoder framework, in which there is an encoder network and decoder network. The encoder can map an input sample into a hidden representation and the decoder can reconstruct the input sample based on the hidden representation. In particular, the encoder-decoder networks for domain adaptation typically involve a shared encoder between the source domain and target domain so that the encoder can learn some domain-invariant representation. An earlier work can be found in [9], in which the stacked denoising auto-encoder is adopted for sentiment classification.

Recently, a typical work called deep reconstruction-classification networks is introduced in [11], in which the encoder and decoder are both implemented with convolutional networks. Specifically, the convolutional encoder is used for supervised classification of the labeled data from the source domain. Meanwhile, it also maps the unlabeled data from the target domain into hidden representation, which is further decoded by the convolutional encoder for reconstructing the input. By jointly training these networks with the data from the source and target domains, the shared encoder can learn some common representations from both datasets, which results in domain adaptation. Other similar work based on auto-encoder can also be found in [11, 28].

#### **6.2 GAN-based approaches**

Traditionally, the GANs [6] consists of a generator and discriminator, where the generator can be seen as a decoder network which can decode some random noise

into a fake sample and the discriminator can be treated as an encoder network which is used to encode the sample into some high-level features for classification (i.e., fake or real). Instead of just using a decoder network as the generator, a typical work known as Cycle GANs is proposed in [10], in which the generator is implemented with an encoder-decoder network. Specifically, this encoder-decoder generator is used for dual learning: *G x*ð Þ!*<sup>s</sup> xt* and *F x*ð Þ!*<sup>t</sup> xs*. And the discriminator also has two roles: to distinguish between the fake *xt* and real *xt*, and to distinguish between the fake *xs* and real *xs*. By alternatively training these two players in GANs, the encoder-decoder generator can lean a reversible mapping function. In other words, the domain-invariant features are obtained from two different datasets. However, one remaining problem is that the encoder-decoder network usually consists of millions of parameters, with enough capacity, it can map an input image from the source domain to any random image which is close to the target domain. Therefore, in addition to using the standard adversarial loss for training the GANs, the consistency loss (i.e., L1 norm) is also proposed to make sure that *FGx* ð Þ ð Þ ≈*x*.

reduced because the model has already obtained rich knowledge from the source

The recent object detection methods are mainly driven by two approaches: Faster R-CNN [32] and YOLO [33]. Specifically, two tasks are mainly involved in object detection: The first one is to detect whether there are objects in an input image (i.e., to output the bounding box of each object in the image); Meanwhile, the object in each bounding box is also classified. Object detection is a very common learning task in many real-world applications such as intelligent surveillance systems [34]. By utilizing domain adaptation approaches for the new task of object detection in the wild, the Domain Adaptive Faster R-CNN is introduced in [35]. And the core idea is also to utilize domain classifier with GRL to encourage domain confusion (i.e., in Section 5.2). Another recent similar work is also discussed in [36], in which the GRL is also adopted and the process of conducting domain adaptation

The convolutional encoder-decoder architecture has achieved great success for

As mentioned in Section 6.2, Cycle GANs [10] is a typical method for image-toimage translation based on deep domain adaptation. In general, image-to-image translation denotes that we can map an image from the source domain to the target domain and vice versa. One real task that is also addressed in Cycle GANs is the style transfer application. To our best knowledge, the algorithm of neural style transfer is firstly proposed in [39], the core idea in this paper is how to define the content loss and style loss between the source data and the target data. Actually, it

convolutional encoder-decoder network can map this image into a pixel-level classification image (i.e., each pixel is classified with a label). The problem of domain shifts can also appear in this task, which results in poor performance on a new domain. In [37], the researchers introduce a domain adversarial learning method which includes both global and category-specific techniques. They argue that two factors can cause domain shift: the global changes between the two distinct domains and the category-specific changes. (i.e., the distribution of cars from two different cities may be different.) Based on this assumption, two new loss functions are introduced, one is used for reducing the global distribution shift between the source images and target images and the other is used for adapting the category-specific divergence between the target images and the transferring label statistics. Instead of just using a simple adversarial objective, the authors in [38] propose an iterative

image segmentation in recent years. Specifically, given an input image, the

optimization procedure based on GANs for addressing domain shift.

model based on a dataset of labeled face images. In contrast, the large-scale unlabeled video datasets are always available. However, the divergence of data in the video is usually limited and there remains a clear gap between these two different domains. In order to utilize the rich information from video and overcome these challenges, the authors in [31] propose a framework for face recognition in

unlabeled video based on the adversarial domain adaptation approach.

is divided into two stages called progress domain adaptation.

Another typical real-world application that we can gain benefits from domain adaptation is face recognition. A general approach to solve this problem is to train a

domain.

*7.1.2 Object detection*

*Transfer Learning and Deep Domain Adaptation DOI: http://dx.doi.org/10.5772/intechopen.94072*

*7.1.3 Image segmentation*

*7.1.4 Image-to-image translation*

**55**

$$\mathcal{L}\_{\text{cyc}}(\mathbf{G}, \mathbf{F}) = \mathbb{E}\_{\mathbf{x}\_{l} \sim \text{data}(\mathbf{x}\_{l})} || \mathbf{F}(\mathbf{G}(\mathbf{x}\_{l})) - \boldsymbol{\mathfrak{x}}\_{l} ||\_{\mathbf{1}} + \mathbb{E}\_{\mathbf{x}\_{l} \sim \text{data}(\mathbf{x}\_{l})} || \mathbf{F}(\mathbf{G}(\mathbf{x}\_{l})) - \boldsymbol{\mathfrak{x}}\_{l} ||\_{\mathbf{1}} \tag{8}$$

where *G x*ð Þ*<sup>s</sup>* denotes fake *xt* and *FGx* ð Þ ð Þ*<sup>s</sup>* is reconstructed *xs* (i.e., *F x*ð Þ!*<sup>t</sup> xs*). Inspired by the Cycle GANs, many variants based on encoder-decoder generator are proposed for domain adaptation, such as the Disco GANs [29] and the Dual GANs [30].
