**4. Measuring of accuracy of RGB-D sensors**

In this section, the methods and basic measurements are described leading to proper RGB-D sensor selection, that allows capturing objects in the parallel multiview system. Parallel multi-sensor (multi-view) system is required to reduce the scanning time (because object of interest is a pediatric patient and its potential motion can cause artifacts in resulting 3D model). Parallel means that all sensors in a given topology are capturing the object at the same time. The partial views (depth maps or point clouds produced by single sensors) must be then registered (aligned) and joined into final 3D model.

In our previous study [25], we described interference artifacts, which occur while scanning using ToF sensors. The interference is present also using several SLS sensors: projected patterns are overlaid on the surface of objects. A passive or active stereo camera pair is the technology that, in principle, does not suffer from interference in parallel multi-view systems. The depth scanning precision of sensors is compared in several recent works [26]. Known methodologies for error estimation of sensors often use a precise object and its digital model as ground truth, which is difficult to obtain. The main benefit of versatile methods described in this section is a comprehensive comparison of all sensor technologies. The measurement is based on capturing testing patterns on surfaces at small distances. The ToF sensors seem to be more accurate against the passive or active stereo pairs, according to the recent works [27, 28]. The next contribution of this research to the practice is the evaluation of the differences between these technologies in terms of accuracy. The results should show if the stereo pairs can achieve similar depth-sensing accuracy as ToF or structured light sensors at small distances.

### **4.1 The noise measurement**

The noise measurement is a simple method based on evaluation of time variability of single points in the depth map. The depth is measured against a flat surface at several given distances. Depth variability (or standard deviation of the depth) in given points is represented as the noise of a depth sensor. Obviously, the noise increases with the sensor-to-object distance. All sensors were placed at distances of 0.5 m, 0.7 m, and 1 m from the flat surface. The scene was captured in 10 seconds.

#### **4.2 Ideal cloud fitting**

Another metric for depth error estimation is a simplified technique based on the methodology of the study [29]. Study [30] brings another technique that can be used as a generalized method for depth error estimation for any device.

The depth error estimation is based on comparing two point clouds. First point cloud is created from captured depth map and the second is the ideal software model, as shown in **Figure 2**. The results of the following measurements are extracted from our study [31]. The testing pattern is the chessboard 9 7 squares (square side length is 36 mm). The ideal reference point cloud was generated with the same dimensions.

*Usage of RGB-D Multi-Sensor Imaging System for Medical Applications DOI: http://dx.doi.org/10.5772/intechopen.106567*

**Figure 2.** *Real point cloud captured by RealSense D415 sensor compared with ideal one.*

The corners of a chessboard in RGB image were detected using the OpenCV algorithm. Based on equations of the pinhole camera model for projection from image coordinate system to world coordinate system (*X*, *Y*, *Z*) we obtain the real point cloud. The *Z*-coordinate is obtained from the depth map at pixel position (*u*, *v*). Following the Eq. (1) and (2), it is needed to know the intrinsic parameters *Cx*, *Cy*, *fx*, and *fy* of cameras for getting world coordinates *X* and *Y:*

$$X = \frac{u - \mathbb{C}\_x}{f\_x} Z,\tag{1}$$

$$Y = \frac{v - C\_{\mathcal{V}}}{f\_{\mathcal{Y}}} Z,\tag{2}$$

Captured (real) point cloud is fitted to the ideal one using translation and rotation estimation. As the global registration technique, Coherent Point Drift was used and final precise registration was Iterative Closest Point (ICP). The Root Mean Square Error (RMSE) of Euclidean distance was used as a relevant metric for sensor accuracy assessment:

$$RMSE = \sqrt{\frac{1}{N} \sum\_{n=1}^{N} \left( p\_i - p\_i' \right)^2},\tag{3}$$

where the *pi* and *pi'*, are the sets of coordinates of the real and ideal points, respectively.

Since we are not able to construct the precise 3D object with chessboard patterns and its ideal software model, we decide to use a flat surface. To simulate 3D scene, we captured the flat chessboard from 3 different views (**Figure 3**). The resulting error is represented as the mean value of errors obtained from views A, B, and C.

### **4.3 Ideal plane fitting**

As described in the study [23], another way of depth error estimation is fitting the captured real point cloud to an ideal surface. We used the plane without chessboard captured from different positions similarly in previous method. The mean Euclidean

#### **Figure 3.**

*The possible approach on how to capture the test chessboard pattern in 3D: (a) precise 3D construction used in studies [29, 30] (b) our approach: Capturing test chessboard from several views.*

**Figure 4.** *Real point cloud captured by RealSense D415 and fitted to ideal plane.*

distance between the ideal plane and real point cloud represent the estimated error. The fitting of real and ideal point clouds is shown in **Figure 4** [31].

#### **4.4 Comparison of selected RGB-D sensors**

In accuracy measurement of sensors described above, we used 4 sensors of different principles. We also assessed the suitability of these sensors in multi-sensor parallel configurations. The ToF sensor in KinectV2 and the structured light sensor in RealSense SR300 use infrared light, so their usage in parallel multi-view system is complicated due to mutual interference. The good candidates for desired imaging system are ZED MINI and RealSense D415. These sensors represent passive and active stereo pairs technology, respectively. The main depth sensor parameters of each camera are summarized in **Table 1** and available on product websites [32, 33] and comparison table in [23].

*Usage of RGB-D Multi-Sensor Imaging System for Medical Applications DOI: http://dx.doi.org/10.5772/intechopen.106567*


#### **Table 1.**

*Depth sensors parameters comparison.*


#### **Table 2.**

*Standard deviation of depth error for different distances.*

**Table 1** brings the comparison of key parameters of each sensor, such as resolution, diagonal field of view (DFOV), frame rate, and range for an optimal distance of sensor from object.

#### *4.4.1 Comparison based on noise measurement*

The noise measurement methodology is described above and results are taken from our study [31]. The standard deviation of depth error for different distances represents amount of sensor noise. The comparison of sensors is in **Table 2**.

**Figure 5** shows the statistical comparison for a distance of 0.5 m. The amount of noise increases with the distance between the sensor and the surface. In this comparison, the offset of sensor is ignored, and also variable part of signal is taken into the account.

As seen in **Table 2**, SLS technology (RealSense RS 300) achieved the best results. As expected, there is an evident difference between ZED MINI and RealSense D415. As expected the accuracy of active stereo sensor is more accurate than passive sensor.

#### *4.4.2 Comparison based on ideal cloud fitting*

In this experiment, the same sensor parameters were set, as in the previous measurement. In the case of ZED MINI and RealSense D415, the chessboard pattern was captured from distances of 0.5 m and 1 m. The results including **Table 3** are obtained from our study [31].

In this experiment, the results only for two sensors were shown, because of several negative facts. Because the RGB resolutions of ZED MINI and RealSense D415 sensors

#### **Figure 5.**

*Noise of depth sensors comparison for distance 0.5 m.*


#### **Table 3.**

*RMS error for multiple distances and positions using ideal cloud fitting.*

are the same, we expect the same corner detection error. For this reason, the comparison of passive and active stereo sensors in this way we consider as precise. Some problems occurred when capturing the chessboard pattern with SR300 and KinectV2 sensors. The sensor SR300 produces the depth map that contains "empty areas" of unknown depth demonstrated as black holes in depth images. Due to this fact it is not able to compute the depth of the chessboard corner point. We can say that the real point of cloud construction is impossible. Also, while using KinectV2 in a distance of more than 0.7 m, the depth map contains the black regions of unknown depth. The depth deviation is caused by different object surface reflections. Such a phenomenon, associated with ToF camera calibration is described in the study [34]. To avoid this, the color version of chessboard instead of black and white can be used. Also, the precisely constructed cube covered by the chessboard pattern is a potential way, as described in [30]. The small resolution of Kinect V2 RGB image does not allow to detect chessboard corners correctly. Due to this fact, the comparison of all 4 technologies in this way we consider inadequate.

## *4.4.3 Comparison based on ideal plane fitting*

**Table 4** taken from Ref [31] shows the comparison of all tested sensor technologies.

The estimated depth error is independent of the corners detection error, so the corresponding results in **Tables 3** and **4** are different. The difference in


#### **Table 4.**

*RMS error for different distances and positions using ideal plane fitting.*

RMSE between structured light and active stereo pair for a distance of 0.5 m is only 0.3 mm.
