**4. Vision based estimation of position and orientation**

The images acquired by the camera set, gives information about 3D world using multiple 2D views. Relation between image objects or additional knowledge about object may be used for estimation of position objects and camera. Without additional knowledge a relative, spatial relations are obtained.

## **4.1 Features and model based approaches**

6 Will-be-set-by-IN-TECH

The cameras are integrated into robot's flexible body. Range of the work is unlimited and not limited to the unique area (volumen). Different camera configurations may be proposed in real–time and tested for optimal object manipulation or movement. The most challenging task is the multiple or single camera calibration [ Daniilidis & Eklundh (2008); Lei et al. (2005); Mazurek (2010; 2009; 2007)]. The estimation of external parameters is especially important for

This is specific version of the previous case and inverse motion capture configuration. The cameras are placed on the robot and the fixed set of markers is observed by them. The robot

It is possible to use cameras for navigation and manipulation purposes and for estimation

One of the most important factor is the power consumption for such case. The motion capture configuration using passive markers on robot does not need additional power for robot. Inverse motion capture system needs a power supply for camera and acquisition devices. Image processing for inverse case by the external computer is important technique for reduction of the needs of the additional electrical power. The weight of the robot is reduced

Another possibility when multiple robots (rigid on non–rigid) are equipped with cameras for navigation, manipulation and self measurements. Swarm members are separated robots from the physical point–of–view, but from the logical point–of–view it is a single robot if the cooperation between members of swarm is very close. The self measurement task (Estimation of own parameters) is very interesting, because the state of the particular member of the

such robots.

**3.3 Video sensors on robot's surface**

Fig. 5. Cameras placed on the robot surface

if the computational part is outside of robot.

**3.4 Cooperative robot swarm with multiple cameras**

own state also.

environment is used for the robot's state estimation (Fig.5).

The vision techniques use feature points or model fitting approaches. Both of them are important for establishing relations between real and virtual (computer modeled) world. In the case of motion capture systems the markers (feature points) are placed on robot or deformable model of robot is used (model fitting).

Feature points are existing features of surrounding object in environment (e.g. corners, edges) or intentionally added (e.g. ball shaped markers, or painted chessboard patterns). Estimation of the position (for point like features) and optionally orientation (for edges or patterns) gives ability of estimation of camera position relative to object.

The model fitting approach is based on the 3D model of robot. The camera measurements are related to the estimation of the pixel assignment to the background or robot body. The aim of the fitting is to find the configuration of the model, that gives image for single camera system or images if multiple camera system are used. The corresponding real and virtual (rendered) images are fitted if the configuration of real robot and its model are identical.

### **4.2 Correspondence by the calibration object**

The simplest technique that is used for establishing relations between virtual and real camera is based on the calibration object. This techniques uses physical object with known physical dimension (*M*) and mathematical model of this object (*V*). The bridge between the real and virtual world is the calibration object and its model (Fig. 6).

Assuming, that the worlds coordinates (*O*, *X*, *Y*, *Z*) are defined if fixed relation in virtual and real calibration object, the full correspondence may between objects, projections and cameras is possible (degenerative cases are not considered here). It means, that all particular positions and orientation have exact values. The projections are the images of the markers from the cameras. Acquired image from the real camera is processed for the marker's positions estimation with subpixel accuracy (e.g. center of mass algorithm may be used). The projection of the virtual markers (*V*) on the virtual camera projection plane is possible using the computer graphics formulas using high, usually floating point accuracy.

During the estimation process of the external parameters the camera, the correspondence is obtained with some error. Markers projections are not identical and cameras parameters are not equal, especially in beginning steps. The error (Fig. 7) between projections (*m*, *v*) of markers (*M*, *V*) is possible to calculate. Comparison of the 2D positions on projection planes using *l*<sup>2</sup> value is used typically (Euclidean distance). Iterative calculations with subject of the minimization of this error are used for establish reliable correspondence.

The accumulative *l*<sup>2</sup> error is computed using the following formula:

$$d\_2 = \sqrt{\sum\_{i} d\_i^2} = \sqrt{\sum\_{i} \left(m\_i - v\_i\right)^2} \tag{1}$$

The Euclidean approach is simpler and cheaper for some cases, especially if the robot is very small. Required large distance between camera and object in real scenarios is main drawback

> **Area of the work**

Fig. 8. Example configuration of three cameras for Euclidean geometry based 3D estimation

The restricted areas and large distance between camera and area of the work requirements are drawback of the Euclidean projections. The perspective projection (Fig. 9) is more applicable

**P**

Perspective projection uses camera with focal point at position *C* and projection plane located at distance *f* , that is focal length. Depending on distance between focal point and point *X* in the 3D space, the projection *x* is in different position on projection plane. The projection formulas of point *X* on projection plane of camera located in arbitrary position in 3D space are available in many computer graphics books [ Hartley & Zisserman (2003);

The perspective projection adds a very important factor – the scale for non–point objects, especially markers. The scale and distance from camera estimation is possible using single camera, depending on the assumed marker estimation technique. Commercial motion capture systems uses very small markers and wide angle cameras (short focal length). The distance is

**y**

cam

**x**

**x**

cam

**Restricted area**

**Restricted area**

**Camera**

261

**Z**

**X**

(Fig. 8). Estimation of the 3D position is necessary using a few cameras.

for Non–Rigid Robots Control Using Motion Capture Techniques

**Camera**

**Restricted area**

for a general case of cameras and different work area configurations.

**Y**

**C**

Fig. 9. Perspective projection

Heyden & Pollefeys (2004)].

**X**

**f**

**Camera**

Estimation of Position and Orientation

system

Fig. 6. Correspondences between real and virtual world using 3D calibration object

Fig. 7. Comparison of the markers' positions (real and virtual) and local distance errors

Minimization process of *l*<sup>2</sup> value by the movement and rotation of virtual camera is possible using gradient and non–gradient search algorithms. The difference between position of markers' projections *di* are reduced to zero only in ideal case. The estimated position of the real markers is obtained with some accuracy due to acquisition errors (image blur, finite resolution of the imaging senor, camera noises, design of the imaging sensor, and estimation algorithm for the position).

Estimation of the 3D position and orientation using 2D images is possible using the projective geometry, but the application of Euclidean geometry is also possible. Euclidean geometry is a subset of the projective geometry and preserves angles. Using the long focal length camera, for high ratio of the camera distance to the robot work area is possible.

.

8 Will-be-set-by-IN-TECH

M2

**projections correspondence**

Fig. 6. Correspondences between real and virtual world using 3D calibration object

v4

v5

d6

d4

d5

d1

Fig. 7. Comparison of the markers' positions (real and virtual) and local distance errors

Minimization process of *l*<sup>2</sup> value by the movement and rotation of virtual camera is possible using gradient and non–gradient search algorithms. The difference between position of markers' projections *di* are reduced to zero only in ideal case. The estimated position of the real markers is obtained with some accuracy due to acquisition errors (image blur, finite resolution of the imaging senor, camera noises, design of the imaging sensor, and estimation algorithm

Estimation of the 3D position and orientation using 2D images is possible using the projective geometry, but the application of Euclidean geometry is also possible. Euclidean geometry is a subset of the projective geometry and preserves angles. Using the long focal length camera,

v1

m1 m2 m7

m4

m5

v

3

d3

d2

v2

M3

Y X Z O

M4

M6

M7

m1 m2

M5

M1

**cameras correspondence**

m6

d7

for high ratio of the camera distance to the robot work area is possible.

v 7

v6

**Real camera: 6DoF**

.

for the position).

m7

m6 m4 m5 m3

**Real world Virtual world**

**objects correspondence**

**Virtual camera: 6DoF**

m3

v7

v6 v4 v 5 v3 Y X Z O

V 4

V 6

V 7

v <sup>1</sup> v2 V 5

V 1

V 2

V 3

The Euclidean approach is simpler and cheaper for some cases, especially if the robot is very small. Required large distance between camera and object in real scenarios is main drawback (Fig. 8). Estimation of the 3D position is necessary using a few cameras.

Fig. 8. Example configuration of three cameras for Euclidean geometry based 3D estimation system

The restricted areas and large distance between camera and area of the work requirements are drawback of the Euclidean projections. The perspective projection (Fig. 9) is more applicable for a general case of cameras and different work area configurations.

Fig. 9. Perspective projection

Perspective projection uses camera with focal point at position *C* and projection plane located at distance *f* , that is focal length. Depending on distance between focal point and point *X* in the 3D space, the projection *x* is in different position on projection plane. The projection formulas of point *X* on projection plane of camera located in arbitrary position in 3D space are available in many computer graphics books [ Hartley & Zisserman (2003); Heyden & Pollefeys (2004)].

The perspective projection adds a very important factor – the scale for non–point objects, especially markers. The scale and distance from camera estimation is possible using single camera, depending on the assumed marker estimation technique. Commercial motion capture systems uses very small markers and wide angle cameras (short focal length). The distance is not well measured, especially for variable light conditions for such configuration. The scale of the larger marker may differ in some direction, so for example the ellipse is observed instead the circle. It gives an ability estimation of full 6 DoF (Degree–of–Freedom) for every large marker.

Correspondence between real and virtual world is used during the calibration of the cameras. Calibrated cameras are used in marker systems or in model fitting approach. The model fitting approach is similar to the calibration process but the instead calibrated object there are two sets of calibrated cameras (real and virtual) and deformable model of the real robot.
