**2. Methods**

#### **2.1 Point cloud preprocessing**

### *2.1.1 Denoise algorithm*

Optical measurement is susceptible to noise and outliers caused by the external environment, object surface material, and reflectivity, and these noises and outliers must be eliminated before point cloud registration. In this study, Statistical Outlier Removal (SOR) [9] was applied to neighboring points of each point in the point cloud cluster. First, the average distance of each point to their K nearest neighbors is calculated. The average distance dmean and standard deviation σ are then used as the threshold for determining noise or outlier. The distance threshold can be expressed as:

$$d\_{thresh} = d\_{mean} + K \ast \sigma \tag{1}$$

where K is a scale factor and is usually set to equal 1.0. **Figure 3** illustrates the outlier determination and removal. First, the distances of the nearest K points Pj (j = 0,1,2,3, … ,k) of the point P*<sup>i</sup>* are calculated. The average distance between Po and its neighboring point Pj is di; if di is greater than the distance threshold, dthresh, Po is regarded as an outlier and is thus removed from the point set P.

**Figure 4a** shows the original point cloud, while **Figure 4b** shows the point cloud with the outlier removed with a multiplicity factor of 1.0 after searching the neighboring points.

### *2.1.2 Downsampling strategy*

The structured-light measurement probe uses a high-resolution sensor of up to two megapixels; hence, the amount of data for the reconstructed point cloud is significant and dense, often reaching hundreds of thousands of points. It is thus necessary to

#### **Figure 4.**

effectively reduce the number of data points while strictly preserving the morphological characteristics of the object for evaluating pose variation.

The widely used point-cloud downsampling algorithms are voxel grid filtering and Delaunay Triangulation. Pixel is the basic unit that constitutes an image, and voxel means a cubic grid, which is the unit cube that constitutes a 3D space. For data reduction, the voxel grid filter replaces all the points in the voxel with the mass center of all the points. All the points in the voxel are represented by the n points belonging to the unit voxel, as shown in **Figure 5**.

Delaunay triangulation [10] allows all input points to form a convex polygon (convex hull) [11], as shown in **Figure 6**, and all triangles within the convex polygon must satisfy the property of maximizing the minimum interior angle and the noncircularity of any four points. If Delaunay triangulation is applied to the 3D space, the four adjacent points can form a tetrahedron and create an external sphere, the so-called Alpha-ball, as shown in **Figure 7**. The red circle denotes the external sphere of the tetrahedron, and the excess points can be removed using the Alpha-ball algorithm [12], thus achieving data reduction while preserving the object's surface characteristics.

As shown in **Figure 8a**, a precision ceramic gauge block with sharp edges was used as the test object to compare the above two-point cloud downsampling methods. The

*<sup>(</sup>a) Original point cloud and (b) point cloud after outlier removal.*

#### **Figure 5.**

*Illustration of voxel-grid filtering: (a) unfiltered voxel; (b) unfiltered unit voxel; and (c) the mass center of the unit voxel.*

#### **Figure 6.** *Illustration of convex hull formed by Delaunay triangulation.*

**Figure 8.** *(a) Original and (b) reconstructed gauge block after outlier removal.*

reconstructed result after outlier removal is shown in **Figure 8b**, and the number of points is enormous, totaling 300,495.

Such a large amount of data typically requires a long computation time. Therefore, voxel grid filtering and Delaunay triangulation are performed in the proposed method to reduce the amount of 3D reconstructed point cloud data. Their performances are compared in terms of the integrity of object features retained. **Figure 9a** shows the result of the original point cloud data after voxel grid filtering; the total number of points becomes 9239, reduced by 96.9%; and **Figure 9b** shows the registration result with an average registration error of 40.8 μm between the original and downsampled point cloud, and the maximum error of 520 μm at the object's corner edge. **Figure 9a** shows the result of the original point cloud data after Delaunay triangulation; the total number of points becomes 9253, reduced by 96.9%; and the registration result, shown in **Figure 10b**, has an average registration error of 10 μm and the maximum error of 90 μm.

Comparing the two-point cloud downsampling methods revealed that Delaunay triangulation can achieve a much smaller registration error than voxel grid filtering, especially in the edge area of the measured object. Voxel grid filtering uses average filtering, which smooths out the edge features of the object. In contrast, Delaunay triangulation removes only unnecessary points, thus preserving the edge features of the object. Hence, using Delaunay triangulation to reduce the number of data points as inputs for the subsequent algorithm can obtain better accuracy when calculating pose variation and avoid the error generated by the preprocessing of point clouds.

**Figure 10.** *(a) Original point cloud after Delaunay triangulation and (b) registration result.*

*(a) Original point cloud after voxel grid filtering and (b) registration result.*

**Figure 9.**

### **2.2 Coarse alignment between scanned data and model object**

This study used the regional surface area descriptor (RSAD), proposed by Chen et al. [6], to determine the geometric relation between 6DOF of two sets of point clouds measured on an object of two different locations and orientations. The method assumes that the same surface area distribution defined in the oriented bounding box (OBB) [13] of a point cloud set can be employed to detect the 6DOF of the object in 3D space. Thus, to find the most matched pose, the RSAD of the measured point cloud is iteratively compared with the feature descriptor generated by the 3D virtual camera from the model point cloud, which represents the target model or the original pose of the object. **Figure 11** illustrates the matching method using the RSAD.

### *2.2.1 Database generation*

First, a coordinate system is established at the center of the model point cloud, and the virtual camera is located on the z-axis of the model point cloud coordinate system, as shown in **Figure 12**. The template point cloud is generated by rotating along the x-axis and y-axis of the model point cloud coordinate system with a fixed incremental rotation angle. As shown in **Figure 13a–f**, the template point clouds recorded by the virtual camera are rotated along the y-axis by 0, 60, 120, 180, 240, and 270 degrees, respectively.

After the virtual camera records the template point clouds with different viewing angles, the surface area feature descriptors can be calculated for each template point cloud. First, the OBB of the template point cloud is calculated; it is the smallest 3D rectangular bounding frame that can be covered by the point cloud, as shown in **Figure 14**. The OBB thus obtained can be divided into *k*<sup>1</sup> � *k*<sup>2</sup> � *k*<sup>3</sup> sub-oriented bounding boxes (Sub-OBB) along its three main directions, as shown in **Figure 15**. The total number of sub-OBB equal to *v* ¼ *k*<sup>1</sup> � *k*<sup>2</sup> � *k*3, and the total number of points covered by the objects in a single sub-OBB is *np <sup>v</sup>* . If n is the total number of points in the whole template point cloud, then the number of points in a sub-OBB *f <sup>v</sup>* can be expressed as:

**Figure 11.** *Matching using regional surface area descriptor (RSAD).*

**Figure 12.** *Location of the virtual camera in model point cloud coordinate system.*

**Figure 13.**

*(a)–(h) are model point clouds captured by the virtual camera at different viewing angles.*

$$f\_v = \frac{n\_v^p}{n} \tag{2}$$

where *np <sup>v</sup>* is the total number of points in the vth sub-OBB and n is the total number of points in the template point cloud.

According to (2), the percentage of point clouds in each sub-OBB can be calculated to obtain the regional surface area distribution of the whole point cloud, which is called the surface area feature descriptor FV and expressed as:

**Figure 14.** *OBB of an object point cloud.*

**Figure 15.** *Sub-OBB of an object point cloud.*

$$FV = \{f\_0^p, f\_1^p, \dots, f\_v^p\} \tag{3}$$

The surface area distribution of the object point cloud in **Figure 14** is shown in **Figure 16**.

### *2.2.2 Feature matching*

Feature matching involves two steps, which aim to obtain the closest pose between the model point cloud and the object point cloud. First, the OBB parameters between the object point cloud and the model point cloud are compared. If the above matching conditions are met, the surface area feature descriptors of the object point cloud and the model point cloud are compared. The feature matching process is summarized in **Figure 17**.

Each OBB is represented by three principal vectors and a corner. The matching process involves finding a template point cloud with an OBB size similar to the object point cloud. These two OBBs should satisfy the following equation:

#### **Figure 16.**

*Regional surface area distribution histogram of the object point cloud.*

#### **Figure 17.** *Flowchart of feature matching process.*

$$d\_{corr} = \frac{1}{3} \left\{ \frac{||\mathbf{CC\_{M1}} - \mathbf{CC\_{O1}}||}{||\mathbf{CC\_{M1}}||} + \frac{||\mathbf{CC\_{M2}} - \mathbf{CC\_{O2}}||}{||\mathbf{CC\_{M2}}||} + \frac{||\mathbf{CC\_{M3}} - \mathbf{CC\_{O3}}||}{||\mathbf{CC\_{M3}}||} \right\} < d\_{thr} \tag{4}$$

where *dthresh* is the given adequate threshold, *CCMi* and *CCOi* (i = 1,2,3) are the three principal vectors of the model OBB and object OBB, respectively.

If (Eq. (4)) is satisfied, matching is successful. On the contrary, if *dcorr* >*dthresh*, matching fails, indicating a significant OBB size difference between the object point cloud and the model point cloud. Assume that **Figure 18a** and **b** depict, respectively, the OBB of the object and model point cloud. According to details of their three

**Figure 18.** *(a) OBB of the model point cloud and (b) OBB of the object point cloud.*


#### **Table 1.**

*OBB parameters of model and object point clouds.*

principal vectors listed in **Table 1**, the OBB matching coefficient *dcorr* between the two point clouds is 0.593. Hence, if the matching coefficient threshold is set as 0.2, then the matching between **Figure 18a** and **b** fails because *dcorr* exceeds the given threshold.

Following this, to determine the best match between the model point cloud and the object point cloud measured using the 3D scanner, the feature descriptors of the model point cloud and the object point clouds measured at different viewing angles are matched using conventional normalized cross-correlation (NCC) [14, 15].

If the OBB matching satisfies (Eq. (4)), then the regional surface area descriptors of the two-point clouds are compared. The regional surface area descriptor shows the surface area distribution of the point cloud within its OBB. The similarity of regional surface area descriptors between the two point clouds can be determined using the NCC method [14, 15]. Assume that the regional surface area descriptor of the object point cloud is expressed as *FO* ¼ *f <sup>v</sup>*, *v* ¼ 0, … , *nv* � �, and the regional surface area descriptor of the model point cloud is *FM* ¼ *f <sup>v</sup>*, *v* ¼ 0, … , *nv* � �, then the NCC coefficient between the two descriptors *FO* and *FM* can be expressed as:

$$\mathcal{C}(F\_O, F\_M) = \frac{\sum\_{v=0}^{n\_v} \left(f\_v - \overline{f}\right) \left(f\_{Mv} - \overline{f\_M}\right)}{\sqrt{\sum\_{v=0}^{n\_v} \left(f\_v - \overline{f}\right)^2 \bullet \sum\_{v=0}^{n\_v} \left(f\_{Mv} - \overline{f\_M}\right)^2}} \tag{5}$$

where

$$\overline{f} = \frac{\mathbf{1}}{n\_v + \mathbf{1}} \sum\_{v=0}^{n\_v} f\_v \tag{6}$$

$$\overline{f\_M} = \frac{1}{n\_v + 1} \sum\_{v=0}^{n\_v} f\_{Mv} \tag{7}$$

If the NCC *C*(*F*0, *FM*) between the RASD of the model point cloud and that of the object point cloud is greater than the preset threshold, the two-point clouds have similar poses in the 3D space. The relationship between the two point clouds can be calculated using OBB parameters. Assume that the initial conversion matrix *Tinitial* is the relationship between the two point clouds, as shown in **Figure 19**. Their conversion relationship in the space *Tinitial* can be expressed as:

*xc xc* þ *v*<sup>11</sup> *xc* þ *v*<sup>21</sup> *xc* þ *v*<sup>31</sup> *yc yc* þ *v*<sup>12</sup> *yc* þ *v*<sup>22</sup> *yc* þ *v*<sup>32</sup> *zc zc* þ *v*<sup>13</sup> *zc* þ *v*<sup>23</sup> *zc* þ *v*<sup>33</sup> 11 1 1 2 6 6 6 4 3 7 7 7 <sup>5</sup> <sup>¼</sup> *Tinitial xMc xMc* þ *vM*<sup>11</sup> *xMc* þ *vM*<sup>21</sup> *xMc* þ *vM*<sup>31</sup> *yMc yMc* þ *vM*<sup>12</sup> *yMc* þ *vM*<sup>22</sup> *yMc* þ *vM*<sup>32</sup> *zMc zMc* þ *vM*<sup>13</sup> *zMc* þ *vM*<sup>23</sup> *zMc* þ *vM*<sup>33</sup> 11 1 1 2 6 6 6 4 3 7 7 7 5 (8)

**Figure 19.** *Schematic diagram of the relationship between object and model point cloud.*

where

$$T\_{initial} = \begin{bmatrix} r\_{11} & r\_{21} & r\_{31} & t\_x \\ r\_{12} & r\_{22} & r\_{23} & t\_y \\ r\_{13} & r\_{23} & r\_{33} & t\_x \\ \mathbf{0} & \mathbf{0} & \mathbf{0} & \mathbf{1} \end{bmatrix}.$$

### **2.3 Fine alignment**

The iterative closest point (ICP) algorithm proposed by Besl [7] is a reliable and widely used point-cloud fine alignment algorithm. The ICP algorithm starts with two point clouds and an initial guess for their relative rigid-body transform and refines iteratively the transform by repeatedly generating pairs of corresponding points on the point clouds and minimizing an error metric. The flowchart of the ICP algorithm is shown in **Figure 20**.

To further improve the computational efficiency of the ICP algorithm and optimize the alignment results of the 3D point cloud, many variants have been developed from the basic ICP concept, such as NICP [16], GICP [17], and Point-to-Plane ICP [18]. Rusinkiewicz [19] classified these variants as six stages in the ICP algorithm, illustrated in **Figure 21,** and indicated that the most critical stages that affect the convergence speed and the accuracy of the ICP algorithm are correspondence calculation and error minimization.

In this work, the classic ICP algorithm is further improved with the normal shooting method used in the stage of correspondence calculation instead of finding the closest point. Moreover, in the stage of metric error selection, the point-to-plane distance is applied instead of the point-to-point distance. The concept of the normal shooting method and point-to-plane distance is shown in **Figures 22** and **23**, respectively.

In the proposed approach, test data make up the measured point cloud obtained by measuring a robotic arms end effector with the developed structured-light 3D scanner. In contrast, target data make up the measured point cloud transformed by a known transformation matrix, as shown in **Figure 24a**. The white point cloud is the original point cloud, and the green point cloud contains target data transformed by a known transformation matrix. An example of the end effector alignment result is shown in **Figure 24b**. It is important to indicate that no artifact calibration is deployed

**Figure 20.** *Flow chart of the ICP algorithm.*

**Figure 21.** *Six stages of the ICP algorithm.*

in the detection and alignment of the method. This is significantly different from the conventional method, which generally utilizes a known or pre-defined artifact for alignment and calibration purposes. The alignment object used in the method is only the robot arm itself.

**Figure 25** compares the performance between the classic ICP algorithm and the proposed method. As can be seen, the proposed method takes only 0.116 seconds for the mean squared error (MSE) of point-to-point distance to converge to less than

**Figure 22.**

*Correspondence calculation is achieved by (a) the conventional method by finding the closest point and (b) the proposed method using normal shooting.*

**Figure 23.**

*Error metric selection is achieved by (a) the conventional method using point-to-point distance and (b) the proposed method using point-to-plane distance.*

**Figure 24.**

*Fine point-cloud registration: (a) test data (white) and target data (green); (b) registration result after fine alignment.*

1 μm, while the classic ICP takes a longer time of 0.659 seconds to reach convergence, yet a higher deviation of 230 μm. These results indicate that the proposed method has a faster convergence speed and higher accuracy than the classic ICP algorithm.

**Figure 25.** *Comparison of performance between conventional ICP and the proposed method.*
