**2. Deep learning for MIMO-based semantic communications**

Future 6G wireless networks are expected to bridge the physical and cyber worlds, enabling human interactions with multiple intelligent devices through various data

modalities like images and text [33]. This introduces a number of applications from autonomous driving to the Internet of Everything, which involves intelligent humanto-machine and machine-to-machine connections. These new fascinating applications have imposed challenging requirements on communication networks, including ultrahigh reliability, ultra-low latency, and extremely high data rates [34]. However, supporting and enabling such applications will require coping with explosive growth in bandwidth and complexity, due to the transmission of these massive datasets and large models. Driven by the aforementioned requirements, there is a springing up of semantic communication research in both academia and industry. The growing trend of semantic communications aims at accurately recovering the statistical structure of the underlying information of the source signal and designing the communication transceiver in an end-to-end fashion, similar to joint source and channel coding (JSCC) by taking the source semantics into account. A data-aware communication transceiver with intelligence that is able to understand the relevance and meaning of data traffic is of paramount importance as it would significantly improve the transmission efficiency.

#### **2.1 Formulation of semantic communications**

In general, a semantic communication MIMO transceiver can be modeled as the framework shown in **Figure 7**, where an end-to-end communication system is developed to incorporate coding and modulation [35]. In particular, the encoding, decoding, and transmission procedures are parameterized by the DNNs, and the system is optimized in a back-propagation manner with the data-driven method. In particular, as shown in **Figure 7**, the transmitter maps the source, **s**, into a symbol stream, **x**, and then passes it through the physical channel with transmission impairments. The received symbol stream, **y**, is decoded at the receiver to have an estimation of the source, ^**s**. Both the transmitter and the receiver are represented by DNNs. In particular, the DNNs at the transmitter consist of the semantic encoder and channel encoder, while the DNNs at the receiver consist of the semantic decoder and channel decoder. The semantic encoder learns to transform the transmitted data into an encoded feature vector while the semantic decoder learns to recover the transmitted data from the received signals. Moreover, the channel encoder and channel decoder aim at eliminating the signal distortion caused by the wireless channel.

We consider a MIMO system with *Nt* transmit antennas and *Nr* receive antennas. The encoded symbol stream can be represented by

$$\mathbf{x} = f\_2\left(f\_1(\mathbf{s}; \theta\_1); \theta\_2\right), \tag{1}$$

**Figure 7.** *The framework of a typical semantic communication system.*

where **x**∈ *Nt*�<sup>1</sup> , *θ*<sup>1</sup> and *θ*<sup>2</sup> denote the trainable parameters of the semantic encoder, *f* <sup>1</sup>ð Þ� , and the channel encoder, *f* <sup>2</sup>ð Þ� , respectively. Subsequently, the signal received at the receiver, **y** ∈ *Nr*�<sup>1</sup> , is given by

$$\mathbf{y} = \mathbf{H}\mathbf{x} + \mathbf{n},\tag{2}$$

where **<sup>H</sup>** <sup>∈</sup> *Nr*�*Nt* denotes the channel matrix and **<sup>n</sup>** � CN **<sup>0</sup>**, *<sup>σ</sup>*<sup>2</sup> ð Þ**<sup>I</sup>** is the additive white Gaussian noise (AWGN). Correspondingly, the decoded signal is given as

$$
\hat{\mathbf{s}} = \operatorname{g}\_1(\operatorname{g}\_2(\mathbf{y}; \Theta\_2); \Theta\_1),
\tag{3}
$$

where **Θ**<sup>1</sup> and **Θ**<sup>2</sup> denote the trainable parameters of the semantic decoder, *g*1ð Þ� , and the channel decoder, *g*2ð Þ� , respectively. For clarity, we denote *θ* as the set of trainable parameters and *f <sup>θ</sup>*ð Þ� as the DNNs in the considered semantic communication systems. Thus, we have ^**s** ¼ *f <sup>θ</sup>*ð Þ**s** .

#### **2.2 Semantic importance-gudied design for MIMO transceivers**

Regarding the physical layer transceiver, the modules of the transceiver are often optimized independently. In particular, the modulation, beamforming, and signal detection modules are designed to minimize the bit-error-rate (BER), and the channel feedback and estimation aim to optimize the mean-square-error (MSE). In the case of semantic communications, the MIMO transceiver can be designed by revising the modules in the traditional transceiver. Next, we will discuss some advancements in the MIMO transceiver design in semantic communications.


by allocating subchannels with high SNRs to features with high importance levels, as these features would play a more important role in the target task. For instance, the semantic MIMO system designed in ref. [39] significantly outperforms traditional MIMO systems by jointly considering the CSI and entropy distribution of the semantic features, where the entropy can be regarded as a measure of semantic importance.

