**3.1 A brief overview of the classical IBVS architecture**

The control of the visual system, the eye-in-hand camera configuration [21], is discussed in this section. The general visual servoing problem can be divided into two categories: PBVS and IBVS [5]. This work focuses on the IBVS structure. **Figure 1** shows the control block diagram for a classical IBVS architecture. In **Figure 1**, *s* ¼ ½ � *u*, *v <sup>T</sup>* is the image feature position vector, *<sup>s</sup>* <sup>¼</sup> ½ � *<sup>u</sup>* , *<sup>v</sup> <sup>T</sup>* is the target image feature position vector, and their difference *e* ¼ *s* � *s* is the error vector. The IBVS structure is a cascaded control loop with an outer controller and an inner joint controller. The outer controller takes the feature error e as input and generates the positional targets, which are denoted as *q*\_ in the **Figure 1**. The inner controller

**Figure 1.** *The block diagram of the classical IBVS control architecture.*

*Role of Uncertainty in Model Development and Control Design for a Manufacturing Process DOI: http://dx.doi.org/10.5772/intechopen.104780*

stabilizes the joints to generated positional targets from outer controller. *Le* is the so-called interaction matrix [5], which is a 2-by-6 matrix, and relates the time derivative of the image feature *s* to the spatial velocity of the camera *Vc*, a column vector of six-elements, by the following:

$$
\dot{s} = L\_{\epsilon} V\_{c} \tag{1}
$$

We can design a proportional controller to force the error to exponentially converge to zero, i.e.:

$$
\dot{e} = -ke, \ k > 0 \tag{2}
$$

Suppose the target image feature is a constant; that is *s*\_ = 0, hence, we can derive from Eq. (2):

$$
\dot{e} = \dot{\mathfrak{s}}^\* - \dot{\mathfrak{s}} = -L\_\mathfrak{e} V\_\mathfrak{e} \tag{3}
$$

From Eqs. (2) and (3), one will be able to obtain:

$$V\_{\varepsilon} = kL\_{\varepsilon}^{+}e\tag{4}$$

where *L*<sup>þ</sup> *<sup>e</sup>* is the pseudoinverse of *Le*. The derivation of the interaction matrix with a monocular camera is further explained next. We assume a point with the threedimension (3D) coordinates in the camera frame is given as *<sup>P</sup><sup>C</sup>* <sup>¼</sup> *<sup>X</sup><sup>C</sup>*, *<sup>Y</sup><sup>C</sup>*, *<sup>Z</sup><sup>C</sup>* � �. We further assume a zero-skew coefficient, i.e., *sC* ¼ **0** in Eq. (2) with a baseline distance *b* ¼ **0**, for a monocular camera model, then the image feature coordinate *s* ¼ ½ � *u*, *v T* can be expressed as:

$$
\begin{bmatrix} u \\ v \end{bmatrix} = \begin{bmatrix} \frac{f\_u X^C}{Z^C} + u\_0 \\ \frac{f\_v Y^C}{Z^C} + v\_0 \\ \end{bmatrix} \tag{5}
$$

Taking the time derivative of Eq. (5), we can obtain:

$$
\begin{bmatrix} \dot{u} \\ \dot{v} \end{bmatrix} = \begin{bmatrix} \frac{f\_u \left( \dot{\mathbf{X}}^C \mathbf{Z}^C - \dot{\mathbf{Z}}^C \mathbf{X}^C \right)}{\left( \mathbf{Z}^C \right)^2} \\\\ \frac{f\_v \left( \dot{\mathbf{Y}}^C \mathbf{Z}^C - \dot{\mathbf{Z}}^C \mathbf{Y}^C \right)}{\left( \mathbf{Z}^C \right)^2} \end{bmatrix} \tag{6}
$$

The rigid body motion of a 3D point in the camera model can be derived as:

$$\dot{\boldsymbol{P}}^{\mathcal{C}} = \boldsymbol{\nu}^{\mathcal{C}} + \boldsymbol{\nu}^{\mathcal{C}} \times \boldsymbol{P}^{\mathcal{C}} \Longleftrightarrow \begin{cases} \dot{\boldsymbol{X}}^{\mathcal{C}} = \boldsymbol{\nu}\_{\mathcal{X}}^{\mathcal{C}} - \boldsymbol{\nu}\_{\mathcal{Y}}^{\mathcal{C}} \mathbf{Z}^{\mathcal{C}} + \boldsymbol{\nu}\_{\mathcal{x}}^{\mathcal{C}} \mathbf{Y}^{\mathcal{C}} \\ \dot{\boldsymbol{Y}}^{\mathcal{C}} = \boldsymbol{\nu}\_{\mathcal{Y}}^{\mathcal{C}} - \boldsymbol{\nu}\_{\mathcal{Z}}^{\mathcal{C}} \mathbf{X}^{\mathcal{C}} + \boldsymbol{\nu}\_{\mathcal{Z}}^{\mathcal{C}} \mathbf{Z}^{\mathcal{C}} \\ \dot{\boldsymbol{Z}}^{\mathcal{C}} = \boldsymbol{\nu}\_{\mathcal{Z}}^{\mathcal{C}} - \boldsymbol{\nu}\_{\mathcal{X}}^{\mathcal{C}} \mathbf{Y}^{\mathcal{C}} + \boldsymbol{\nu}\_{\mathcal{Y}}^{\mathcal{C}} \mathbf{X}^{\mathcal{C}} \end{cases} \tag{7}$$

Substituting (7) into (6), and rearranging the terms, we obtain:

$$
\begin{bmatrix} \dot{u} \\ \dot{v} \end{bmatrix} = \begin{bmatrix} \frac{f\_u}{Z^C} & 0 & -\frac{u - u\_0}{Z^C} & -\frac{(u - u\_0)(v - v\_0)}{f\_v} & \frac{f\_u^{-2} + (u - u\_0)^2}{f\_u} & -\frac{f\_u(v - v\_0)}{f\_v} \\\\ 0 & \frac{f\_y}{Z^C} & -\frac{v - v\_0}{Z^C} & -\frac{f\_v^{-2} + (v - v\_0)^2}{f\_v} & \frac{(u - u\_0)(v - v\_0)}{f\_u} & \frac{f\_v(u - u\_0)}{f\_u} \end{bmatrix} \begin{bmatrix} v\_X^C \\ v\_Y^C \\ v\_Z^C \\ v\_X^C \\ v\_Y^C \\ v\_Z^C \end{bmatrix} \tag{8}
$$

Eq. (8) can be simply written as:

$$
\dot{s} = L\_\epsilon V\_\epsilon = L\_\epsilon \begin{bmatrix} v^C \\ a^C \end{bmatrix} \tag{9}
$$

Some drawbacks of the classical IBVS are summarized next. To compute the interaction matrix *Le* from Eq. (8), the depth *Z<sup>C</sup>* needs to be estimated. This can be usually approximated as either the depth of the initial position or the depth of the target position or their average value [5]. A careless estimation of the depth may lead to a system instability. In addition, the design of the proportional controller is based on, Eq. (1), the camera kinematic relationships, such that there is no dynamics considered in this model. The kinematic model is sufficient for very slow responding system, however, for faster responses, one has to take into account the manipulator dynamics along with the camera model.

In this work, we propose a new controller algorithm, similar to the classical IBVS structure, where the controller is designed with the complete dynamic and kinematic models of the robot manipulator and the camera. Furthermore, this algorithm does not require any depth estimation, therefore, it will not be necessary to use the interaction matrix. The development of this new algorithm is presented in sections 6 and 7 of this Chapter.

### **3.2 A brief overview of sources of uncertainties and approaches for reduction**

Uncertainties in automated manufacturing can originate from different sources. We can divide these uncertainties into two categories: sensor measurement noise, and dynamic and kinematic modeling errors from both the measurement system and the robot manipulators. This section briefly reviews each uncertainty source including the proposed methods for reducing these uncertainties.

### *3.2.1 A brief overview of a stereo camera model and its calibration*

A camera model (i.e., the pin-hole model [22]) has been adopted in the visual servoing techniques to generate an interaction matrix [5]. The object depth, the distance between a point on the object and the camera center as illustrated in **Figure 2**, needs to be either estimated or approximated by an interaction matrix [5]. One of the methods is to directly measure the depth by a stereo (binocular) camera with the use of two image planes [23].

*Role of Uncertainty in Model Development and Control Design for a Manufacturing Process DOI: http://dx.doi.org/10.5772/intechopen.104780*

**Figure 2.** *The projection of a scene object on the stereo camera's image planes.*

As shown in **Figure 2**, two identical cameras are separated by a baseline distance b. An object point, *<sup>P</sup><sup>C</sup>* <sup>¼</sup> *<sup>X</sup><sup>C</sup>*, *<sup>Y</sup><sup>C</sup>*,*Z<sup>C</sup>* � �*<sup>T</sup>* , which is measured in the camera frame, at the baseline center, is projected to two parallel virtual image planes, and each plane is located between each optical center (*Cl* or *CR*) and the object point *PC*. The intrinsic camera parameters relate the coordinates of the object point in the camera frame and its corresponding image coordinates *p* ¼ ð Þ *u*, *v* on each of the image plane with an exact mathematical relationship. This relationship is given by:

Note: *v* coordinate on each image plane is not shown in the plot but is measured along the axis that is perpendicular to and point out of the plot.

$$
\begin{bmatrix} u\_l \\ v\_l \\ \mathbf{1} \end{bmatrix} = \frac{\mathbf{1}}{Z^C} \begin{bmatrix} f\_u & s\_c & u\_0 \\ \mathbf{0} & f\_v & v\_0 \\ \mathbf{0} & \mathbf{0} & \mathbf{1} \end{bmatrix} \begin{bmatrix} X^C \\ Y^C \\ Z^C \end{bmatrix} - \frac{b}{2Z^C} \begin{bmatrix} f\_u \\ \mathbf{0} \\ \mathbf{0} \end{bmatrix} \tag{10}
$$

$$
\begin{bmatrix} u\_r \\ v\_r \\ \mathbf{1} \end{bmatrix} = \frac{\mathbf{1}}{Z^{\mathbb{C}}} \begin{bmatrix} f\_u & s\_c & u\_0 \\ \mathbf{0} & f\_v & v\_0 \\ \mathbf{0} & \mathbf{0} & \mathbf{1} \end{bmatrix} \begin{bmatrix} X^{\mathbb{C}} \\ Y^{\mathbb{C}} \\ Z^{\mathbb{C}} \end{bmatrix} + \frac{b}{2Z^{\mathbb{C}}} \begin{bmatrix} f\_u \\ \mathbf{0} \\ \mathbf{0} \end{bmatrix} \tag{11}
$$

Where, *f <sup>u</sup>* and *f <sup>v</sup>* are the horizontal and the vertical focal lengths, and, *sc* is a skew coefficient. In most cases, *f <sup>u</sup>* and *f <sup>v</sup>* are different if the image horizontal and vertical axes are not perpendicular. In order not to have negative pixel coordinates, the origin of the image plane will be usually chosen at the upper left corner instead of the center. *u***<sup>0</sup>** and *v***<sup>0</sup>** describe the coordinate offsets. The camera model uncertainties may arise from the estimation of those camera intrinsic parameter values. The camera calibration can be used to precisely estimate these values.

The stereo camera calibration has been well studied in [24–27]. As summarized in [24], the calibration method can be divided into two broad categories: the photogrammetric calibration and the self-calibration. In the photogrammetric calibration [25], the camera is calibrated by observing a calibration object whose geometry is well known in the 3D space. These methods are very accurate but require an expensive apparatus and elaborate setups [24]. The self-calibration [24, 26, 27] is performed by finding the equivalences between the captured images of a static scene from different perspectives. Although cheap and flexible, these methods are not always reliable [24]. The author in [24] proposed a new self-calibration technique that observe planar pattern at different orientations and showed improved results.

### *3.2.2 A brief overview of the robot manipulator model and its calibration*

In this work, we consider the elbow manipulators [28] with the spherical wrist in the multi-robot system to move an end-effector freely in 6 degrees of freedoms (dofs). This model of the robot has six links with three for the arms and the other three for the wrist. The robot arms freely move the end effector to any position in the reachable space with 3 dofs while the robot spherical wrists allow the end effector to orient in any directions with another 3 dofs. For the elbow manipulators, a joint is connected between each two adjacent links and there are in total six convolutional joints. The specific industry model of this type is ABB IRB 4600 [29].

A commonly used convention for selecting and generating the reference frames in the robotic applications is the Denavit-Hartenberg convention (or D-H convention) [30]. Suppose each link is attached to a Cartesian coordinate frame, *OiXiYiZi*. In this convention, each homogeneous transformation matrix *Ai* (from frame *i* � **1** to frame *i*) can be represented as a product of four basic transformations:

$$A\_i = \text{Rot}\_{x, q\_i} \text{Trans}\_{x, d\_i} \text{Trans}\_{x, a\_i} \text{Rot}\_{x, a\_i} = \begin{bmatrix} c\_{q\_i} & -s\_{q\_i}c\_{a\_i} & s\_{q\_i}s\_{a\_i} & a\_i c\_{q\_i} \\ s\_{q\_i} & c\_{q\_i}c\_{a\_i} & -c\_{q\_i}s\_{a\_i} & a\_i s\_{q\_i} \\ 0 & s\_{a\_i} & c\_{a\_i} & d\_i \\ 0 & 0 & 0 & 1 \end{bmatrix} \tag{12}$$

**Note:** *cqi* � cos *qi* � �, *<sup>c</sup><sup>α</sup><sup>i</sup>* � cosð Þ *<sup>α</sup><sup>i</sup>* , *sqi* � sin *qi* � �, *<sup>s</sup><sup>α</sup><sup>i</sup>* � sin ð Þ *<sup>α</sup><sup>i</sup>* .

Where *qi* , *ai*, *α<sup>i</sup>* and *di* are parameters of link *i* and joint *i*, *ai* is the link length, *qi* is the rotational angle, *α<sup>i</sup>* is the twist angle and *di* is the offset length between the previous ð Þ *<sup>i</sup>* � **<sup>1</sup>** *th* **and** the current *<sup>i</sup> th* robot links. The quantities of each parameter in (12) are calculated based on the steps in [28].

We can generate the transformation matrix from the base frame *O***0***X***0***Y***0***Z***<sup>0</sup>** *P***<sup>0</sup>** � � to the end-effector frame *O***6***X***6***Y***6***Z***<sup>6</sup>** *P***<sup>6</sup>** � �:

$$T\_6^0 = A\_1^0 A\_2^1 A\_3^2 A\_4^3 A\_5^4 A\_6^5 \tag{13}$$

If any point with respect to the end effector frame *P***<sup>6</sup>** is known, we can calculate its coordinate with respect to the base frame *P***<sup>0</sup>** as:

$$P^0 = T^0\_{\;6} P^6 \tag{14}$$

In addition, the transformation from the base frame *P***<sup>0</sup>** to the end-effector frame *P***<sup>6</sup>** can be derived:

*Role of Uncertainty in Model Development and Control Design for a Manufacturing Process DOI: http://dx.doi.org/10.5772/intechopen.104780*

$$T\_0^6 = \left(T\_6^0\right)^{-1} \tag{15}$$

which is used to generate the image coordinates of a point captured by a camera with its center attached to the end effector, from the 3D coordinates of a point in the base frame.

Eq. (13) shows that the position of the end-effector *Pend* (where *Pend* is the origin of the end-effector frame *<sup>P</sup>***<sup>6</sup>**) is a function of all the joint angles *<sup>q</sup>* <sup>¼</sup> *qi* <sup>j</sup>*<sup>i</sup> <sup>ϵ</sup>* **<sup>1</sup>**, **<sup>2</sup>**, **<sup>3</sup>**, **<sup>4</sup>**, **<sup>5</sup>**, **<sup>6</sup>** and the parameters *Pa* ¼ *ai* ½ , *αi*, *di*∣*i ϵ* **1**, **2**, **3**, **4**, **5**, **6**�:

$$P^{\text{end}} = \mathcal{F}(q, Pa) \tag{16}$$

Eq. (16) describes the forward kinematic model of the robot manipulator, which could be utilized to calculate the position of the end effector from the joint angles and the robot parameters. The inverse process is called the inverse kinematic, which the joint angles can be computed from the position and the parameters. The estimation of the robot parameters *Pa* determines the accuracy of the kinematic models of the robot manipulators.

The paper [31] provides a good summary of the current robot calibration methods. The author of [32] states that over 90% of the position errors are due to the errors in the robot zero position (the kinematic parameter errors). As a result, most researchers focus on the kinematic robot calibration (or level 2 calibration [31]) to enhance the robot absolute positioning accuracy [33–36]. Generally, the kinematic model-based calibration involves four sequential steps: Modeling, Measurement, Identification, Correction. Modeling is a development of a mathematical model of the geometry and the robot motion. The most popular one is D-H convention [30] and other alternatives include S-model [37] and zero-reference model [38]. At the measurement step, the absolute position of the end-effector is measured from the sensors, e.g., the acoustic sensors [37], the visual sensors [34], etc. In the identification step, the parameter errors for the robot are identified by minimizing the residual position errors with different techniques [39, 40]. This final step is to implement the new model with the corrected parameters.

On the other hand, the non-kinematic calibration modeling (level 3 calibration [31]) [39, 41], which includes the dynamic factors such as the joint and the link flexibility in the calibration, increase accuracy of the robot calibration, but complicates the mathematical functions that govern the parameters relationship.

### *3.2.3 The image averaging techniques for denoising*

The image noises are inevitably introduced in the image processing. Several image denoising techniques have been proposed so far. A good noise removal algorithm ought to remove as much noise as possible while safeguarding the edges. The Gaussian white noise has been dealt with using the spatial filters, e.g., the Gaussian filter, the Mean filter and the Wiener filter [42]. The noise reduction using the wavelet methods [43, 44] have benefits of keeping more useful details but at the expense of the computational complexity. However, depending on the selected wavelet methods, the filters that operate in the wavelet domain still filter out (or blur) some important high frequency useful information of the original image, even though more edges are preserved with the wavelet method when comparing with the spatial filter approaches.

All the aforementioned methods present ways to reduce noise in the image processing starting from a noisy image. We can approach this problem with the multiple noisy images taken from the same perspective. Assuming the same perspective ensures the same environmental conditions (illumination, temperature, etc.) that affect the image noise level. Given the same conditions, an image noise level taken at a particular time should be very similar to another image taken at a different time. This redundancy can be used for the purpose of improving image precision estimation in the presence of noise. The method that uses this redundancy to reduce noise is called signal averaging (or the image averaging in the application of the image processing) [45]. The image averaging has a natural advantage of retaining all the image details as well as reducing the unwanted noises, given that all the images for the averaging technique are taken from the same perspective. The robot's rigid end effector that holds a camera minimizes shaking and drift when shooting pictures. Furthermore, in the denoising process, the precise estimations require that the original image details to be retained. Considering these previous statements, we decided to choose image averaging over all other denoising techniques in this work.

The image averaging technique is illustrated in **Figure 3**. Assume a random, unbiased noise signal, and in addition, assume that this noise signal is completely uncorrelated with the image signal itself. As noisy images are averaged, the original true image is kept the same and the magnitude of the noise signal is compressed thus improving the signal-to-noise ratio. In **Figure 3**, we generated two random signals with the same standard deviation, and they are respectively represented by the blue and the red lines. The black line is the average of the two signals, whose magnitude is significantly decreased compared to each of the original signal. In general, we can come up with a mathematical relationship between the noise level reduction and the sample size for averaging. Assume we have *N* numbers of Gaussian white noise samples with the standard deviation *σ*. Each sample is denoted as *zi*, where *i* represents *i th* sample signal. Therefore, we can acquire that:

$$
bar{\nu}(\mathbf{z}\_i) = \mathbf{E}(\mathbf{z}\_i^2) = \sigma \tag{17}$$

*<sup>N</sup> <sup>σ</sup>*<sup>2</sup> <sup>¼</sup> <sup>1</sup>

ffiffiffiffi *N* p *σ* � �<sup>2</sup>

(18)

where *E*ð Þ� is the expectation value and *σ* is the standard deviation of the noise signal. By averaging the *N* Gaussian white noise signals, we can write:

¼ 1

*<sup>N</sup>*<sup>2</sup> *<sup>N</sup>σ*<sup>2</sup> <sup>¼</sup> <sup>1</sup>

*zi* !

1 *N* X *N*

**Figure 3.** *An example of a noise level reduction by image averaging.*

*var z*avg � � <sup>¼</sup> *var*
