**5.2. User dependent multi-view video transmission**

## *5.2.1. Switching models*

In order to reduce traffic for multi-view video transmission, we have analyzed which frames should be displayed when the viewpoint is switched. Our work mainly focuses on the successive motion model (Pan, et, al., 2011, 2011). In the successive motion model as shown by Fig. 13, user is only able to switch to the neighboring views. In other words, if the multiview video contains the views (1, 2… M), user is just able to switch from any view j to the view j', where max (1, j-1) ≤ j'≤ min(j+1, M). This kind of switching model is used in the applications such as free viewpoint TV and Remote Surgery System in which user's head is tracked to decide which views should be displayed.

**Figure 13.** Switching models

90 Wireless Sensor Networks – Technology and Protocols

streaming.

*5.2.1. Switching models* 

**5. Multi-view multi-robot sensor networks** 

As we mentioned in the introduction section, applications of the MRSN will be more advanced if multi-cameras are equipped on the robot nodes. The reason is quite similar to human with more eyes. From the point of application, multi-view MRSN can be applied in security system that will not miss a corner. In addition, in the medical application, the multiview MRSN can accomplish some complex and long time operations. Meanwhile it can achieve more accurate and small cut operation; besides, multi-view MRSN has quick

The developments of camera and display technologies make recording a single scene with multiple video sequences possible. These multi-view video sequences are taken by closely spaced cameras from different angles. Each video sequence in the multi-view video presents a unique viewpoint of this scene. Therefore, user can switch the viewpoint by playing different video sequences. When a robot is equipped with multi-cameras, it will bring the user who controls the robot a broad perspective. The operator also can switch his viewpoints by playing different video sequences. However, since the multi-view video consists of the video sequences captured by multiple cameras, the traffic of multi-view video is several times larger than conventional multimedia, which brings the dramatic increase in the bandwidth requirement. However, as multi-view video is taken from the same scene, a large amount of inter-view correlation is contained in the video. Therefore, compression transmission technologies are especially important for multi-view video

The state of the art in multi-view representations includes Multi-View Video Plus Depth (Merkle et, al., 2007), Ray-Space (Smolic, et, al., 2006) and Multi-view Video Coding (MVC) (Vetro, et, al., 2008), (Mueller, et, al., 2006). However, the research on Multi-View Video Plus Depth sequences (Merkle et, al., 2007) suggests that with the addition of depth maps and other auxiliary information, the bandwidth requirements could increase. MVC is issued as an amendment to H.264/MPEG-4 AVC (Vetro, et, al., 2008), (Mueller, et, al., 2006). It was reported that MVC makes more significant compression gains than simulcast coding in which each view is compressed independently. However, even with the MVC, transmission bitrates for multi-view video are still high: about 5 Mbps for 704 × 480, 30fps, and 8 camera

In order to reduce traffic for multi-view video transmission, we have analyzed which frames should be displayed when the viewpoint is switched. Our work mainly focuses on the successive motion model (Pan, et, al., 2011, 2011). In the successive motion model as shown

reaction for the vary vital signs and other monitored parameters of the patient.

**5.1. Introduction of multi-view video and open problem** 

sequences with MVC encoding (Kurutepe, et, al. 2007).

**5.2. User dependent multi-view video transmission** 

### *5.2.2. User dependent multi-view video transmission (UDMVT)*

In (Tanimoto, et. al., 2011), they developed two types of user interface for the Free Viewpoint TV. One showed one view according to the viewpoints given by user. With this type of user interface, the viewpoint of user can be switched by an eye/head-tracking system, moving the mouse of a PC or sliding the finger on the touch panel of a mobile player. In a real-time interactive multi-view video system (Lou, et, al., 2005), users can switch viewpoints by dragging the scroll bar to a different position. In the user interfaces of (Tanimoto, et. al., 2011) and (Lou, et, al., 2005), the changing of user's position, moving of mouse, sliding of finger and dragging of scroll bar are all successive motions. Since the switching models of these user interfaces are all successive motion models, it will take some time to switch from the current view to the neighboring view. For instance, in the head-tracking system, user needs to take some time to move from his current position to the next position for the new viewpoint. We call the speed with which user switch from one view to next view "switching speed." With different user and user interfaces, the switching speed is different. Even the same user may switch to a different switching speed each time.

In the successive motion model, which frames should be displayed when user starts to switch to the next view are decided by both the frame rate f (frame/s) of the multi-view video and the switching speed s (view/s) of user. Let k be the floor of the frame rate divided by switching speed: *k fs* . Fig. 14 presents the display of frames when k is 3, 2 and 1.

**Figure 14.** Multi-view video displays with different value of k.

Assuming the frame rates are the same, different frames should be displayed by these three different switching speeds, although they are switching to the same direction. If switching slows down, more frames of the current view should be displayed before the display changes to the next view. Otherwise, less frames of the current view should be displayed. Therefore, k denotes the number of the frames should be displayed in the current view after user starts to switch and before the user reaches the position where display should change to the next view. In practice, the frame rate is about 25~30 (frame/s). The value of switching speed depends on the density of the views and the speed of user interface. However, the switching speed is usually much slower than the frame rate. When the switching speed is about 2~5 (view/s), the k is about 5~15 (frame/view). For simplicity, k=1 and k = 2 are selected as the examples in this paper. Let Fi,j denote the frame of view j at time instant i. By the three-tuples N(p, f, s), it is able to predict a triangle area in which the frames are possible to be displayed in a subsequent period of time. p is the current position Fi0,j0. When the number of the views is M, R(t) is the set of frames that can be displayed at time instant t start from Fi0,j0. Fi,j', R(t), in which:

$$i = i\_0 + \left\lfloor f \times t \right\rfloor \tag{27}$$

$$j' \in \left[ \max\left(1, j\_0 - \left\lfloor s \times t \right\rfloor\right), \min\left(j\_0 + \left\lfloor s \times t \right\rfloor, M\right) \right] \tag{28}$$

As the video continues to play, the frame at time instant i in (1) should be displayed starting from Fi0, j0. User can switch to the view j0- *s t* or j0+ *s t* unless already at border view (view 1 or view M) during the period t. The user may also stop switching at any view before switching to view j0- *s t* or j0+ *s t* . Therefore, it is possible to display frames in view j' shown by (2). The triangles of frames are shown in Fig. 15 when k is 1 and 2, respectively. The frames in the triangle are called potential frames (PFs), which can be switched to and displayed. These frames should be encoded and transmitted. Those frames outside the triangle are called redundant frames (RFs). It is impossible to display RFs no matter how the user switches the viewpoint start from the current position. UDMVT reduces the transmission bitrate for multi-view video transmission by transmitting only the PFs without RFs.

**Figure 15.** The triangles of the frames when k =1 (a) and k=2 (b). The M of the multi-view video is 5. Dotted line represents the possible display path.

When the length of the triangle is L, the number of RFs of the view j in the triangle is:

$$RFs\left(j\right) = \min\left(L, I\left(j\right)\right)$$

I(j) is:

92 Wireless Sensor Networks – Technology and Protocols

**Figure 14.** Multi-view video displays with different value of k.

from Fi0,j0. Fi,j', R(t), in which:

Assuming the frame rates are the same, different frames should be displayed by these three different switching speeds, although they are switching to the same direction. If switching slows down, more frames of the current view should be displayed before the display changes to the next view. Otherwise, less frames of the current view should be displayed. Therefore, k denotes the number of the frames should be displayed in the current view after user starts to switch and before the user reaches the position where display should change to the next view. In practice, the frame rate is about 25~30 (frame/s). The value of switching speed depends on the density of the views and the speed of user interface. However, the switching speed is usually much slower than the frame rate. When the switching speed is about 2~5 (view/s), the k is about 5~15 (frame/view). For simplicity, k=1 and k = 2 are selected as the examples in this paper. Let Fi,j denote the frame of view j at time instant i. By the three-tuples N(p, f, s), it is able to predict a triangle area in which the frames are possible to be displayed in a subsequent period of time. p is the current position Fi0,j0. When the number of the views is M, R(t) is the set of frames that can be displayed at time instant t start

(a)|f| = 3|s|; k = 3 (b)|f| = 2|s|; k = 2 (c)|f | = |s|; k = 1

<sup>0</sup> *ii ft* (27)

As the video continues to play, the frame at time instant i in (1) should be displayed starting from Fi0, j0. User can switch to the view j0- *s t* or j0+ *s t* unless already at border view (view 1 or view M) during the period t. The user may also stop switching at any view before switching to view j0- *s t* or j0+ *s t* . Therefore, it is possible to display frames in view j' shown by (2). The triangles of frames are shown in Fig. 15 when k is 1 and 2, respectively. The frames in the triangle are called potential frames (PFs), which can be switched to and displayed. These frames should be encoded and transmitted. Those frames outside the

0 0 *j j st j st M* ' max 1, ,min , (28)

$$I\left(j\right) = \left|j - j\_0\right| \times k = \left|j - j\_0\right| \times \left\lfloor f \nmid s \right\rfloor$$

In I(j), | j - j0 | is the distant between the view j and the current view j0. The number of RFs in each triangle is:

$$\sum\_{j=1}^{M} RFs\left(j\right) = \sum\_{j=1}^{M} \min\left(L, I(j)\right).$$

So the ratio of PFs to RFs is:

$$\begin{aligned} \label{eq:M} M \times L - \sum\_{j=1}^{M} RFs\{j\} \\ \hline \begin{aligned} \hline \sum\_{j=1}^{M} RFs\{j\} \end{aligned} \end{aligned} \tag{29}$$

From these expressions, it could be found that with the increase in the length L, the ratio of the PFs to RFs increases, which means that more frames should be encoded and transmitted. In other words, the triangle will be enlarged and finally all the frames at the same time instant are involved into the triangle, which is also shown in Fig. 15.

In order to overcome this problem, the N(p, f, s) should be fed back periodically, which is able to divide a large triangle into many smaller triangles as shown in Fig. 16. In the UDMVT, the N(p, f, s) is fed back periodically at the end of the triangle. The fed back N(p, f, s) from the end of the previous triangle is used to predict the next triangle. Therefore, only potential frames are transmitted each time and the transmission bitrate is reduced. N(f, p, s) should be detected at client and fed back periodically. At the server, N(p, f, s) is used to divide the frames into PFs and RFs. The transmission bitrate can be reduce by only transmitting the PFs and ignore the RFs. Although the transmission of RFs is unnecessary, encoding and transmitting the RFs can work as a kind of insurance against some special situations, such as the switching detection error.

**Figure 16.** The triangles of the potential frames. Dotted line represents the possible display path while solid line represents the actual display.
