**5.2 Domain classifier**

where *a <sup>j</sup>* and *b <sup>j</sup>* are different parameters in each layer. Rather than using two networks for domain adaptation, in [21], they introduce a domain guided method to

Alternatively, instead of adapting the parameters in the networks, we can reweight the data in each layer of feed-forward neural networks. The typical method to reduce internal covariate shit in deep neural networks is to conduct batch

*xi* � *<sup>μ</sup>* ffiffiffiffiffiffiffiffiffiffiffiffiffi

Note that *xi* usually denotes the hidden activation of input sample *xi* in each layer of a neural network (e.g., the output feature map of each convolutional layer).

Instead of directly fine-tuning networks, adversarial domain adaptation is an appealing alternative to unsupervised learning. It mainly addresses the problem that there are abundant labeled data in the source domain but sparse/limited unlabeled samples in the target domain. The core idea of the adversarial domain adaptation is based on GANs. Specifically, a generalized architecture to implement this idea is proposed in [7]. In this section, we detail two main ideas: target data generating and

To overcome the limitation of sparse unlabeled data, target data generating is an approach to directly generate samples with labels for the target domain so that we can utilize them to train a classifier for the new task. One representative work is the CoGANs [25], in which there are two GANs involved: one for processing the labeled data in the source domain and another for processing the unlabeled data in the target domain. Part of the weights in the two generators is shared/tied in order to reduce the domain divergence. In addition to two discriminators for classifying the fake and real samples, there is also an extra classifier to classify the samples based on the information of labels in the source domain. By jointly training these two GANs, we can generate unlimited pairs of data, in which each pair consists of a synthetic source sample and a synthetic target sample and each pair shares the same label. Therefore, after finishing jointly training the two GANs, the pre-trained extra classifier is the function *F<sup>t</sup>* that we need for solving the new task. Similar work can also be found in [26], in which a transformation in the pixel space is introduced. In summary, target data generating is a domain adaptation approach that focuses on generating target data, which can also be treated as an auxiliary task to reduce domain shift by a weight sharing mechanism between two GANs. The main disadvantage is that the training cost for generating synthesized samples with two GANs

*<sup>σ</sup>*<sup>2</sup> <sup>þ</sup> *<sup>ε</sup>* <sup>p</sup> <sup>þ</sup> *<sup>β</sup>* (7)

. *B* is the batch size, *γ* and *β* are two hyper-

*x*^*<sup>i</sup>* ¼ *γ*

2

parameters to learn. Based on this method, [23] propose a revised method for practical domain adaptation. And in [24], researchers adopt instance normalization

*<sup>i</sup>*¼<sup>1</sup>ð Þ *xi* � *<sup>μ</sup>*

drop some weights in the networks directly.

*Advances and Applications in Deep Learning*

**4.4 Sample regularization**

*<sup>μ</sup>* <sup>¼</sup> <sup>1</sup> *B* P*<sup>B</sup>*

for stylization.

domain classifier.

**52**

**5.1 Target data generating**

normalization during training [22].

*<sup>i</sup>*¼<sup>1</sup>*xi* and *<sup>σ</sup>* <sup>¼</sup> <sup>1</sup>

*B* P*<sup>B</sup>*

**5. Adversarial domain adaptation**

Instead of directly synthesizing labeled data for domain adaptation, an alternative way is to add an extra domain classifier to enough domain confusion. The role of domain classifier is similar to that of the discriminator in GANs, it can distinguish the data between the source domain and target domain (the discriminator in GANs is responsible for recognizing the fake from the real data). With the help of an adversarial learning approach, the domain classifier can help the network learn domain-invariant representation from the source domain and the target domain. In other words, the trained model can be directly used for the target/new task.

Therefore, the key is how to conduct adversarial learning with the domain classifier. In [8], a gradient reversal layer (GRL) before domain classifier is introduced to maximize the gradients for encouraging domain confusion (we normally minimize the gradients for reducing the scalar value of a loss function). In [27], a domain confusion loss is proposed beside the domain classifier loss.
