**3.1 Time of flight RGB-D sensor**

The time of flight (ToF) RGB-D sensors are optical sensors that measure the depth of scene using an active light source. This light source emits an amplitude modulated signal. The emitted signal can be continuous or impulse. Most ToF cameras generate amplitude-modulated continuous waves (AMCW) with a frequency near IR for illuminating the scene [22]. The depth of scene is based on measurement of the amplitude of phase shift between received and transmitted modulated signal. The depth information for each pixel can be calculated by the synchronous demodulation of the received modulated light in the detector. The demodulation can be performed by interleaving with the original modulated signal.

## **3.2 Stereo RGB-D sensor**

## *3.2.1 Passive stereo sensor*

Passive stereo vision RGB-D sensors reproduce the depth of the scene the same way as the binocular vision of a human. The scene must be captured from different points of view. This is done by using two RGB sensors (corresponding to human eyes) horizontally separated by known distance. This distance is called baseline. For example, the depth sensor ZED MINI used in our experiments has two RGB sensors separated by a 12 cm baseline. The depth of scene is then computed based on disparity of corresponding points in single views. Solving the correspondence problem means giving a point in the image and finding the same point in another image [23]. Stereo sensors use computationally intensive algorithms to search for point matches and for computation of depth. These sensors are suitable for environments with good lighting conditions including outdoors.

## *3.2.2 Active stereo sensor*

If the scene contains fewer color and intensity variations, or lighting conditions are not good, the passive stereo vision system can be less effective and accurate. The typical example of such environment is the texture-less surface like indoor dimly lit white walls. Active stereo vision relies on the addition of an optical projector that overlays the observed scene with a semi-random texture that facilitates finding correspondences. The current generation of RealSense D4xx sensors working in bright environments captures the texture of objects in really slight details and they are applicable also outdoors. In the case of scanning dynamic objects using the multisensor system, there is no limitation on how many sensors are used in a given physical layout. It does not decrease the quality of scanning process if several sensors project their light patterns to the same part of scene. All additional projectors actually improve the overall performance by adding more light and more texture [24].
