**2.4 Viewpoint-on-demand – simplifying spatially faithful solutions in low latency network**

5G seems to provide enough bitrate with low latency for future 3D telepresence services. According to Ronan McLaughlin (Ericsson, Ltd.), the 5G system design parameters specify a system capable of delivering an enhanced mobile broadband (eMBB) experience, in which users should experience a minimum of 50–100 Mbps everywhere, and see peak speeds greater than 10 Gbps, with a service latency of less than 1 ms, while moving at more than 300 miles/h! (https://broadbandlibrary.com/ ?s=5G+Low+Latency+Requirements).

Spatial faithfulness requires a shared geometry between meeting participants. In order to form and maintain such geometry, user positions need to be detected, tracked, and delivered at each moment. In addition, defined by the geometry, 3D data from several remote sites needs to be streamed to each local viewer, and compiled in a unified 3D representation. If each of the 3D captures is a full reconstruction of the corresponding 3D space, the overall bitrate requirement for rendering each view becomes huge. This may be even too much for the emerging 5G network (or at least costly). In addition to high bitrate, a very important potential of 5G network is its low latency (cf. the above figures by Ronan McLaughlin).

Most of the existing approaches for 3D telepresence are aiming to capture, encode and stream visual data of at least partial 3D volumes, which then can be seen from various viewpoints in the receiver. However, a person is able to see only from one (binocular) viewpoint at a time, which means that at each moment, there is need to see only one projection to a 3D volume. Assuming that a viewer's motions are moderate, and that a low latency network like 5G is available for data streaming, the complexity of a 3D telepresence system can be considerably reduced by streaming only video-plus-depth (V + D) projections from tracked viewpoints. Valli and Siltanen have made several telepresence inventions using this sc. viewpoint-on-demand (VoD) approach [8–10]. In particular, applying augmented reality to 3D telepresence is described in inventions [13, 14]. Note that an example of our recent 3D streaming implementation is given later in Chapter 3.4, and using the solution for supporting XR functionalities is described in more detail in Chapter 4.

Note that synthesizing viewpoints e.g. for supporting motion parallax and correcting camera offset for eye contact may be possible without ordering new data and experiencing a corresponding two-way delay as a result. An obvious way of reducing the need for delivering data for new viewpoint is to use multipleviewpoint video coding instead of video-plus-depth (V + D) data [15]. This allows more freedom for viewpoint changes within the received stream. For examples of reducing viewpoint orders, see also several inventions by Valli and Siltanen on synthesizing stereoscopic or accommodative (MFP) content for small viewpoint changes [16, 17].

#### **2.5 Photorealistic vs. virtual world (VW) approaches**

Note that serving with arbitrary viewpoints may be easier in VW approaches, where virtual camera views are formed to a shared virtual meeting space (VW), using the knowledge of each viewer's pose (tracked by VR glasses, or defined by a participant e.g. by a mouse). However, virtual environments with animated avatars are less natural, and may even alienate a participant by causing the sc. uncanny valley effect. On the other hand, using modeled avatars for participants provides the possibility for their anonymity or role-play, which is an obvious benefit in some use cases and services.

#### *Advances in Spatially Faithful (3D) Telepresence DOI: http://dx.doi.org/10.5772/intechopen.99271*

Using a virtual world approach is a viable option used by several service vendors (see e.g. references in https://en.wikipedia.org/wiki/Virtual\_world). In VW approaches, meeting spaces are typically modeled in advance, and as much as possible, also delivered to the participants in advance. Coding and delivering corresponding 3D information may be based e.g. on hierarchical volume coding methods like OctoMap by Hornung et al. [18]. Despite partly in-advance delivery for the meeting space, a lot of accurate motion and animation parameters remain to be delivered, and graphical processing to be made for local renderings (e.g. for forming viewing frustums). As a result, although possibly lighter than photorealistic approaches, VW approaches are by far not simple either.

In a photorealistic approach, capturing, forming, and delivering reconstructions of 3D volumes and human participants is more challenging, although the meeting spaces can likely be modeled in advance. Once formed, 3D reconstructions can be viewed like in VW approaches, using NEDs or HMDs. Naturally, hybrid solutions combining photorealistic and VW approaches are also possible. For example, 3D modeled environments may be used instead of captured meeting spaces, and XR functionalities can be used for augmenting avatars.

**Figure 4** (cropped screenshots of YouTube videos by the courtesy of Oliver Kerylos) gives examples of hybrid (XR) approaches, showing real-time captured participants in 3D modeled meeting spaces.

As seen in **Figure 4**, a particular challenge in this approach is that a glasses display covers a person's face, which prevents a viewer from seeing his face and perceiving eye contact. Solutions for this have however since been described in literature, based on real-time manipulation of facial areas [19].

As a short summary, main approaches for 3D telepresence can be classified into the following four classes (**Table 1**).

Note that the quadrants of the table correspond to the classical reality-virtuality continuum by Milgram and Kishino [20]. Current videoconferencing and Virtual World approaches correspond to the real and virtual ends of this continuum, and hybrid approaches respectively to intermediate positions labeled as augmented reality (AR) and augmented virtuality (AV). Note that in parallel to the commonly used term "Mixed Reality" (MR), also the term "Hybrid Reality" (HR) was discussed in [20]. Recently the term "Extended Reality" (XR) has gained popularity much in the same meaning.

In an augmented reality (AR) approach, a virtual avatar is representing each remote participant in a local environment. This requires capturing a remote participant's facial and body gestures and animating the avatar correspondingly. Respectively, in an augmented virtuality (AV) approach, photorealistic 3D captures of participants are made and delivered in real-time to a virtual meeting space (VW).

#### **Figure 4.**

*Examples of XR approaches: a) person capture in a virtual space (2014), b) interaction in virtual space (2016), c) XR collaboration in virtual space (2012). (see https://www.youtube.com/channel/ UCj\_UmpoD8Ph\_EcyN\_xEXrUQ ).*

#### **Table 1.** *Main approaches for 3D telepresence.*

In our case, hybrid approaches are particularly interesting. Compared to local (traditional) XR visualizations, combining real and virtual elements over distances (i.e. remote XR) causes particular challenges. These are discussed in more detail later in Chapter 2.7 and 4.1.
