**3. Geometric relations**

each point in one image, in homogeneous coordinates, to a ray in Euclidean threedimensional space. Secondly, SfM has to estimate the projection matrices *P* for the

*Visual representation from [2]. It depicts the challenge of converting a set of 4 pictures into a 3D sketch featuring the line segments and camera axis. The 4 cameras are represented as three axis reference frame in red.*

where **x** is a 2D point on the image, and **X** its projection in 3D space. **K** is intrisic to each camera, while **P** is extrinsic and embeds the 3D translation and rotation of

A common space can be built to host the cameras and spatial lines. For this new common space the camera that took the first processed picture takes the place of the origin, and for the rest of cameras **P** can be estimated from the lines matched between the captured images. Alternatively, camera poses can also be retrieved from a feature-point based SfM pipeline and these cameras be employed for the estimation of spatial lines. For instance, the feature-point descriptor SIFT [5] can be used to match points in images with a low ratio of outliers. These feature point relations are obtained both in the foreground and background. A set of relations between points or lines in two images allows to estimate the homography constraints between both views by applying the 5-point algorithm [12] using the points

the camera's image plane. The estimated translation is valid up to scale.

or the segment endpoints. A purge of outliers can be performed employing RANSAC [13] for robust estimation. Therefore, a set of stereo 3D projections is obtained combining the available images pairwisely, and each stereo system featuring both camera poses and a point cloud. The objective is to obtain an unique 3D point cloud sketch, embedding all cameras and point matches. Hence, camera poses are sequentially stacked, relative to each other, in the new spatial reference frame. And the 3D estimations for the feature points in the new 3D space can be computed as the center of gravity for their position relative to the common camera in both stereo systems. Finally a sparse bundle adjustment [14] is used to minimize the pixel distance of the back-projected 3D point and the original observation of this point on each image in homogeneous coordinates. These reprojection errors on the planes of the cameras are minimized employing the Levenberg-Marquardt algorithm. The resulting keypoint-based 3D reconstruction contains the optimized 3D

Several straight segment matching methods are based on texture descriptors [15, 16], coloring [17] or in keypoint-line projective invariants [18, 19]. Under these conditions, matching results will be influenced by the level of texture in the images.

In the case that a low number of detected segments can be distinguished by

**x** ¼ **PX**, (1)

cameras, representing a map from 3D to 2D:

*Applications of Pattern Recognition*

**Figure 1.**

**46**

estimations for the cameras and the point cloud.

A *3D abstraction method* estimates the position of 3D line segments Γ = {Γ1, Γ2, Γ3, ..., Γ*N*}, from an unordered sequence of images, taking from cameras with planes ϒ = {ϒ<sup>1</sup> ,ϒ<sup>2</sup> ,ϒ<sup>3</sup> ,...,ϒ*<sup>M</sup>*}. Straight lines are detected in the original images, put in correspondence among them, forward projected into space, and rewritten in homogeneous coordinates.

The 3D line based sketch {ϒ,Γ} is built from the knowledge of correspondences among line projections *l* on camera planes, and the intrinsics of all the cameras. The following paragraphs explain the linear triangulation of these observations, as performed from scale-space images. This allows to discriminate and weight down lines that have been detected on two or more scales with a different slope. The practical consequence is that prior to any 3D extrapolation of the observed lines, matching inliers with inconsistent endpoint location among scales on both images can be avoided, as these lines might introduce uncertainty in the estimation for the pose of the camera.

The camera poses **P** are estimated from the endpoint correspondences of *l*. The Essential matrix **E** is computed from the camera pairs, by using the Five-Point Algorithm [12], and RANSAC [13] for hypotheses generation. Having **E** and *l*, the relative camera rotation and translation among the first pair of cameras *<sup>P</sup><sup>j</sup>* <sup>¼</sup> ½ � **<sup>R</sup>**j**<sup>t</sup>** are estimated using cheirality check and discarding the triangulated endpoints that are not in front of the cameras. The left camera is chosen to have the pose *<sup>P</sup>*<sup>1</sup> <sup>¼</sup> ½ � **<sup>I</sup>**j<sup>0</sup> , and the newly added cameras are stacked from this position in the unique reference frame.

The forward projection of lines in 3-space is described in the page 196 of Hartley and Zisserman's book [20]. The 3D forward projection **Γ<sup>i</sup>** of a line, bundled in the same reference frame, can be obtained using the *DLT* method on the set of stereo 3D camera back-projections. This is performed in homogeneous coordinates because it allows to consider line endpoints in the infinite. Therefore, from now on, when a 2D point is mentioned it will be supposed homogeneous coordinates. There exists a 3 � 3 matrix **E**, known as the essential matrix, such that if *u* and *u*<sup>0</sup> are a pair of matched points, then *u*<sup>0</sup> **E***u* ¼ 0. If a sufficient number of matched points are known, the matrix **E** may be computed as the solution of an overdetermined set of linear equations. For the present problem, the internal calibration of the cameras is known, therefore it is possible to determine from **E** the relative placement of the cameras and hence the relative locations of the 3D points corresponding to the matched points. A linear triangulation method is projective-invariant because only camera and line distances are minimized.

The above described DLT method for lines starts with the segments on the pair of cameras ϒ*a*, ϒ*<sup>b</sup>* with the highest inlier ratio. Based on this first triangulation, the other cameras are appended to the 3D abstraction: The next camera ϒ*<sup>c</sup>* is chosen according to the higher inlier ratio of line matching with ϒ*<sup>a</sup>* and ϒ*b*. Analogously, the following camera ϒ*<sup>n</sup>* is picked among the ones with the higher inlier ratio of line matching with previously selected cameras. The detection of 2D lines *l* in the original images carry an uncertainty for the position of these observations. This uncertainty implies that no 3D point *X* will satisfy that their projections on cameras <sup>ϒ</sup><sup>1</sup> and <sup>ϒ</sup><sup>2</sup> are *<sup>x</sup>***<sup>1</sup>** <sup>¼</sup> *<sup>P</sup>*<sup>1</sup> *<sup>X</sup>*, *<sup>x</sup>***<sup>2</sup>** <sup>¼</sup> *<sup>P</sup>*<sup>2</sup> *X* respectively. Moreover, the image points do not satisfy the epipolar constraint *x***2***Fx***<sup>1</sup>** ¼ 0. Therefore, a method that only minimizes the distances on the image from the estimations to the observations is required: A projective-invariant triangulation method. A linear triangulation [20] method does not depend on the projective frame in which *X* is defined.

The forward projection from a normalized 2D line observed on the camera image plane *m*, denoted by *l m <sup>i</sup>* , is the plane *Pm<sup>T</sup>l m <sup>i</sup>* , so the condition for a point *X<sup>a</sup>* to be in this plane is:

$$\left(I\_i^m\right)^T \mathbf{P}\_m \mathbf{X}\_a = \mathbf{0}.\tag{2}$$

*xm p***<sup>3</sup>***<sup>T</sup> <sup>m</sup> Xj*,*<sup>E</sup>* � � � *<sup>p</sup>***<sup>1</sup>***<sup>T</sup>*

*ym p*<sup>3</sup>*<sup>T</sup>*

*xm p*<sup>2</sup>*<sup>T</sup>*

where *xm*, *ym*

coordinates.

represented as *l* = {*l*

� � and *xn*, *yn*

*Build 3D Abstractions with Wireframes DOI: http://dx.doi.org/10.5772/intechopen.96141*

equation of the form *AX <sup>j</sup>*,*<sup>E</sup>* ¼ 0. Solving:

mapping to a 3D line **Γ***<sup>j</sup>* via *Pm*, is the plane *Pm***Γ***j*.

1 1, *l* 1 2, ..., *l* 1 *<sup>N</sup>*, ..., *l M*

estimations for each 3D line [21].

camera plane <sup>ϒ</sup> *<sup>j</sup>* are the spatial points <sup>T</sup> *<sup>j</sup>*

intersections by forward projecting <sup>T</sup> *<sup>j</sup>*

**49**

routine with the endpoints of *<sup>l</sup>*. The correspondences in <sup>T</sup> *<sup>j</sup>*

*<sup>m</sup> X <sup>j</sup>*,*<sup>E</sup>* � � � *<sup>p</sup>*<sup>2</sup>*<sup>T</sup>*

*<sup>m</sup> X <sup>j</sup>*,*<sup>E</sup>*

*A* ¼

The result of the linear triangulation process is **Γ<sup>i</sup>** and *u<sup>j</sup>*

� � � *ym <sup>p</sup>*<sup>1</sup>*<sup>T</sup><sup>X</sup> <sup>j</sup>*,*<sup>E</sup>*

� � are the coordinates of *x<sup>m</sup>*

*xmp*<sup>3</sup>*<sup>T</sup>*

*ymp*<sup>3</sup>*<sup>T</sup>*

*xnp*<sup>3</sup>*<sup>T</sup>*

*ynp*<sup>3</sup>*<sup>T</sup>*

*<sup>m</sup>* � *pm*

*<sup>m</sup>* � *<sup>p</sup>*<sup>2</sup>*<sup>T</sup> m*

*<sup>n</sup>* � *<sup>p</sup>*<sup>1</sup>*<sup>T</sup> n*

*<sup>n</sup>* � *<sup>p</sup>*<sup>2</sup>*<sup>T</sup> n*

The solution for the 4 equations of the over-determined problem (four equations for four homogeneous variables) is only valid up to scale. The set of points in space

Every 3D segment Γ*<sup>i</sup>* is estimated as the center of gravity of the estimations for the same line for each par of images. The set of line projections observed in ϒ is

projections from *l* of the same 3D line Γ*i*. The Line Features are noted as *L* = {*L*1, *L*2, ..., *LN*}. The 3D lines Γ are obtained by forward projecting the endpoints of *l* from pairs of camera planes of ϒ, by using linear triangulation, analogously to Direct Linear Transformation (*DLT*) [20]. The cameras ϒ are sequentially bundled in the same reference frame. The new ones are stacked according to the *L*-to-Γ correspondences, computed in the previous stereo pair of cameras. The merged estimations for 3D lines {Γ*i*} are computed as the center of gravity of the spatial lines. The 3D sketch {ϒ,Γ} generated by linear triangulation is used as input for an optimization algorithm. The least-squares optimization named Sparse Bundle Adjustment (SBA) [14] is based on the Levenberg–Marquardt algorithm, and uses as input the estimated camera extrinsics ϒ and the set Γ, now containing unique

The 3D estimations for lines and cameras are drawn in the same spatial sketch, altogether with the cameras. Next, these spatial line segments Γ are fit to different different planes P. Γ is therefore segmented into different groups according to the planes P, and so is done with their projections *L*. The group of Line Features fitted to the plane P*<sup>t</sup>* is noted as *Ft*. The intersections of the coplanar lines *Ft* on the

the original images, now known which line segments are coplanar. The intersections of these coplanar lines on the images are described similarly as a feature point. Following this analogy, the descriptor for this feature point will be the pair of two coplanar lines drawing it. We have the correspondences of the straight lines accross images, so we can extrapolate these correspondences to their intersection for the cases where they are coplanar. Secondly, known the correspondences between these intersections, they can be triangulated analogously as it was performed in the first

linear triangulation algorithm, in order to create initial estimates for the 3D

1*T*

is the *r*-th row of *Pm*. It can be decomposed similarly for *Pn*, and compose the

*<sup>m</sup> Xj*,*<sup>E</sup>*

*<sup>m</sup> X <sup>j</sup>*,*<sup>E</sup>*

� � <sup>¼</sup> 0, (4)

� � <sup>¼</sup> 0, (5)

*<sup>j</sup>*,*<sup>E</sup>* and *x<sup>n</sup>*

*<sup>N</sup>* }. A Line Feature is defined as a subgroup of

*<sup>t</sup>* . Therefore, the algorithm can go back to

*<sup>t</sup>* . The set of estimations for the 3D points

*<sup>t</sup>* are then fed into the

� � <sup>¼</sup> <sup>0</sup> (6)

*:* (7)

, represented in cartesian

*<sup>j</sup>*,*<sup>E</sup>* respectively. *prT*

*m*

Each point *X<sup>a</sup>* returns a linear equation in the entries of *Pm*. Denoting by *xi <sup>m</sup>*,*<sup>E</sup>* and *x<sup>i</sup> <sup>m</sup>*,*<sup>F</sup>* the forward projection of the endpoints of *l i <sup>m</sup>*, named *X<sup>i</sup> <sup>E</sup>* and *X<sup>i</sup> F*, under *Pm*, then any other 3D point on the line *X<sup>i</sup>* ð Þ¼ *<sup>μ</sup> <sup>X</sup><sup>i</sup> <sup>E</sup>* <sup>þ</sup> *<sup>μ</sup>X<sup>i</sup> <sup>F</sup>* projects to a point:

$$\mathbf{x}\_{i}^{m}(\mu) = \mathbf{P}\_{m}(\mathbf{X}\_{i,E} + \mu \mathbf{X}\_{i,F}) = \mathbf{x}\_{i,E}^{m} + \mu \mathbf{x}\_{i,F}^{m},\tag{3}$$

which is on the line segment *l m i* .

In the described method, an unique reference frame is built. The world reference system is fixed onto the first camera, hence its camera matrix, *PE*, is computed with *RE* ¼ *I* and *TE* ¼ **0**. The extrinsics for the partner camera *Pm* on the baseline is obtained from the essential matrix by using RANSAC. Before the subsequent DLT triangulations with a new camera, its extrinsics are estimated also by RANSAC from the 2D-3D results of the already computed DLT. From here, new cameras will be added incrementally, just one per DLT iteration, in order to avoid DLTs between two uninitialized camera projection matrices.

For DLT it is required a set of observed line correspondences, *l m <sup>j</sup>* to *l n <sup>j</sup>*, matched among images. The projection on the image plane of camera *m* of an endpoint *X<sup>i</sup>*,*<sup>E</sup>* of the spatial line **Γ***<sup>j</sup>* is denoted as *x<sup>m</sup> <sup>j</sup>*,*<sup>E</sup>* ¼ *PmX <sup>j</sup>*,*<sup>E</sup>*. This point on the *m*-th camera plane is matched to its counterpart on the *n*-th camera *x<sup>n</sup> <sup>j</sup>*,*<sup>E</sup>* ¼ *PnX <sup>j</sup>*,*<sup>E</sup>*. Both equations can be combined into *AX <sup>j</sup>*,*<sup>E</sup>* ¼ 0, where **A** is the matrix of equation coefficients. It is built from the matrix rows *A<sup>r</sup>* contributed from each correspondence, whose resemble the movement of each line between both views. *Xj*,*<sup>E</sup>* contains the unknowns for the endpoint position.

By using the cross product on the *m*-th camera: *l m <sup>j</sup>* � *PmX <sup>j</sup>*,*<sup>E</sup>* <sup>¼</sup> 0, *Build 3D Abstractions with Wireframes DOI: http://dx.doi.org/10.5772/intechopen.96141*

of matched points, then *u*<sup>0</sup>

*Applications of Pattern Recognition*

<sup>ϒ</sup><sup>1</sup> and <sup>ϒ</sup><sup>2</sup> are *<sup>x</sup>***<sup>1</sup>** <sup>¼</sup> *<sup>P</sup>*<sup>1</sup>

image plane *m*, denoted by *l*

be in this plane is:

*xi*

**48**

*<sup>m</sup>*,*<sup>E</sup>* and *x<sup>i</sup>*

a point:

camera and line distances are minimized.

*<sup>X</sup>*, *<sup>x</sup>***<sup>2</sup>** <sup>¼</sup> *<sup>P</sup>*<sup>2</sup>

does not depend on the projective frame in which *X* is defined.

*m*

under *Pm*, then any other 3D point on the line *X<sup>i</sup>*

*xm*

two uninitialized camera projection matrices.

of the spatial line **Γ***<sup>j</sup>* is denoted as *x<sup>m</sup>*

unknowns for the endpoint position.

which is on the line segment *l*

**E***u* ¼ 0. If a sufficient number of matched points are

*X* respectively. Moreover, the image points do not

*<sup>i</sup>* , so the condition for a point *X<sup>a</sup>* to

*PmX<sup>a</sup>* ¼ 0*:* (2)

*<sup>E</sup>* <sup>þ</sup> *<sup>μ</sup>X<sup>i</sup>*

*<sup>m</sup>*, named *X<sup>i</sup>*

*<sup>E</sup>* and *X<sup>i</sup> F*,

*<sup>F</sup>* projects to

*<sup>i</sup>*,*<sup>F</sup>*, (3)

*m <sup>j</sup>* to *l n*

*<sup>j</sup>*,*<sup>E</sup>* ¼ *PnX <sup>j</sup>*,*<sup>E</sup>*. Both equa-

*<sup>j</sup>*, matched

*i*

ð Þ¼ *<sup>μ</sup> <sup>X</sup><sup>i</sup>*

*<sup>i</sup>*,*<sup>E</sup>* <sup>þ</sup> *<sup>μ</sup>x<sup>m</sup>*

*<sup>j</sup>*,*<sup>E</sup>* ¼ *PmX <sup>j</sup>*,*<sup>E</sup>*. This point on the *m*-th camera

*<sup>j</sup>* � *PmX <sup>j</sup>*,*<sup>E</sup>*

<sup>¼</sup> 0,

*m*

known, the matrix **E** may be computed as the solution of an overdetermined set of linear equations. For the present problem, the internal calibration of the cameras is known, therefore it is possible to determine from **E** the relative placement of the cameras and hence the relative locations of the 3D points corresponding to the matched points. A linear triangulation method is projective-invariant because only

The above described DLT method for lines starts with the segments on the pair of cameras ϒ*a*, ϒ*<sup>b</sup>* with the highest inlier ratio. Based on this first triangulation, the other cameras are appended to the 3D abstraction: The next camera ϒ*<sup>c</sup>* is chosen according to the higher inlier ratio of line matching with ϒ*<sup>a</sup>* and ϒ*b*. Analogously, the following camera ϒ*<sup>n</sup>* is picked among the ones with the higher inlier ratio of line matching with previously selected cameras. The detection of 2D lines *l* in the original images carry an uncertainty for the position of these observations. This uncertainty implies that no 3D point *X* will satisfy that their projections on cameras

satisfy the epipolar constraint *x***2***Fx***<sup>1</sup>** ¼ 0. Therefore, a method that only minimizes the distances on the image from the estimations to the observations is required: A projective-invariant triangulation method. A linear triangulation [20] method

The forward projection from a normalized 2D line observed on the camera

Each point *X<sup>a</sup>* returns a linear equation in the entries of *Pm*. Denoting by

*<sup>i</sup>* ð Þ¼ *<sup>μ</sup> <sup>P</sup>m*ð Þ¼ *<sup>X</sup><sup>i</sup>*,*<sup>E</sup>* <sup>þ</sup> *<sup>μ</sup>X<sup>i</sup>*,*<sup>F</sup> <sup>x</sup><sup>m</sup>*

In the described method, an unique reference frame is built. The world reference system is fixed onto the first camera, hence its camera matrix, *PE*, is computed with *RE* ¼ *I* and *TE* ¼ **0**. The extrinsics for the partner camera *Pm* on the baseline is obtained from the essential matrix by using RANSAC. Before the subsequent DLT triangulations with a new camera, its extrinsics are estimated also by RANSAC from the 2D-3D results of the already computed DLT. From here, new cameras will be added incrementally, just one per DLT iteration, in order to avoid DLTs between

*m*

*<sup>i</sup>* , is the plane *Pm<sup>T</sup>l*

*l m i <sup>T</sup>*

*<sup>m</sup>*,*<sup>F</sup>* the forward projection of the endpoints of *l*

*m i* .

For DLT it is required a set of observed line correspondences, *l*

plane is matched to its counterpart on the *n*-th camera *x<sup>n</sup>*

By using the cross product on the *m*-th camera: *l*

among images. The projection on the image plane of camera *m* of an endpoint *X<sup>i</sup>*,*<sup>E</sup>*

tions can be combined into *AX <sup>j</sup>*,*<sup>E</sup>* ¼ 0, where **A** is the matrix of equation coefficients. It is built from the matrix rows *A<sup>r</sup>* contributed from each correspondence, whose resemble the movement of each line between both views. *Xj*,*<sup>E</sup>* contains the

$$
\lambda \pi\_m \left( p\_m^{3T} \mathbf{X}\_{j,E} \right) - \left( p\_m^{1T} \mathbf{X}\_{j,E} \right) = 0,\tag{4}
$$

$$\mathcal{y}\_m(\mathbf{p}\_m^{\text{3T}}\mathbf{X}\_{\cdot j,E}) - \left(\mathbf{p}\_m^{\text{2T}}\mathbf{X}\_{\cdot j,E}\right) = \mathbf{0},\tag{5}$$

$$\times\_m \left( \mathbf{p}\_m^{2T} \mathbf{X}\_{j,E} \right) - \jmath\_m \left( \mathbf{p}^{1T} \mathbf{X}\_{j,E} \right) = \mathbf{0} \tag{6}$$

where *xm*, *ym* � � and *xn*, *yn* � � are the coordinates of *x<sup>m</sup> <sup>j</sup>*,*<sup>E</sup>* and *x<sup>n</sup> <sup>j</sup>*,*<sup>E</sup>* respectively. *prT m* is the *r*-th row of *Pm*. It can be decomposed similarly for *Pn*, and compose the equation of the form *AX <sup>j</sup>*,*<sup>E</sup>* ¼ 0. Solving:

$$\mathbf{A} = \begin{bmatrix} \boldsymbol{\kappa}\_{m} \mathbf{p}\_{m}^{3T} - \mathbf{p}\_{m} \mathbf{}^{1T} \\\\ \boldsymbol{\chi}\_{m} \mathbf{p}\_{m}^{3T} - \mathbf{p}\_{m}^{2T} \\\\ \boldsymbol{\kappa}\_{n} \mathbf{p}\_{n}^{3T} - \mathbf{p}\_{n}^{1T} \\\\ \boldsymbol{\chi}\_{n} \mathbf{p}\_{n}^{3T} - \mathbf{p}\_{n}^{2T} \end{bmatrix}. \tag{7}$$

The solution for the 4 equations of the over-determined problem (four equations for four homogeneous variables) is only valid up to scale. The set of points in space mapping to a 3D line **Γ***<sup>j</sup>* via *Pm*, is the plane *Pm***Γ***j*.

The result of the linear triangulation process is **Γ<sup>i</sup>** and *u<sup>j</sup>* , represented in cartesian coordinates.

Every 3D segment Γ*<sup>i</sup>* is estimated as the center of gravity of the estimations for the same line for each par of images. The set of line projections observed in ϒ is represented as *l* = {*l* 1 1, *l* 1 2, ..., *l* 1 *<sup>N</sup>*, ..., *l M <sup>N</sup>* }. A Line Feature is defined as a subgroup of projections from *l* of the same 3D line Γ*i*. The Line Features are noted as *L* = {*L*1, *L*2, ..., *LN*}. The 3D lines Γ are obtained by forward projecting the endpoints of *l* from pairs of camera planes of ϒ, by using linear triangulation, analogously to Direct Linear Transformation (*DLT*) [20]. The cameras ϒ are sequentially bundled in the same reference frame. The new ones are stacked according to the *L*-to-Γ correspondences, computed in the previous stereo pair of cameras. The merged estimations for 3D lines {Γ*i*} are computed as the center of gravity of the spatial lines.

The 3D sketch {ϒ,Γ} generated by linear triangulation is used as input for an optimization algorithm. The least-squares optimization named Sparse Bundle Adjustment (SBA) [14] is based on the Levenberg–Marquardt algorithm, and uses as input the estimated camera extrinsics ϒ and the set Γ, now containing unique estimations for each 3D line [21].

The 3D estimations for lines and cameras are drawn in the same spatial sketch, altogether with the cameras. Next, these spatial line segments Γ are fit to different different planes P. Γ is therefore segmented into different groups according to the planes P, and so is done with their projections *L*. The group of Line Features fitted to the plane P*<sup>t</sup>* is noted as *Ft*. The intersections of the coplanar lines *Ft* on the camera plane <sup>ϒ</sup> *<sup>j</sup>* are the spatial points <sup>T</sup> *<sup>j</sup> <sup>t</sup>* . Therefore, the algorithm can go back to the original images, now known which line segments are coplanar. The intersections of these coplanar lines on the images are described similarly as a feature point. Following this analogy, the descriptor for this feature point will be the pair of two coplanar lines drawing it. We have the correspondences of the straight lines accross images, so we can extrapolate these correspondences to their intersection for the cases where they are coplanar. Secondly, known the correspondences between these intersections, they can be triangulated analogously as it was performed in the first routine with the endpoints of *<sup>l</sup>*. The correspondences in <sup>T</sup> *<sup>j</sup> <sup>t</sup>* are then fed into the linear triangulation algorithm, in order to create initial estimates for the 3D intersections by forward projecting <sup>T</sup> *<sup>j</sup> <sup>t</sup>* . The set of estimations for the 3D points

resembling the intersections is a sparse cloud, and it is denoted as R. Finally, and same as with the endpoints, the 3D intersections R enter the least-squares optimization. The SBA returns the new optimized estimations for ϒ, and the optimal 3D intersections R. The spatial line and camera pose estimations are corrected by forward projecting them from the newly estimated camera planes ϒ. This returns the final sketch {ϒ,Γ}. The high level diagram on **Figure 2** shows the process described in this section.

adequate to improve the 3D sketch. Some of the reasons for this are that recurrent segment mismatches, fragmentation or the inaccurate placement of counterparts may prevent the convergence of the optimization. It is possible to perform a linebased Bundle Adjustment by converting the primitives into Plücker coordinates [20, 21] within the cost function of the optimisation process. This allows a reduction

in the number of parameters and the computational cost.

*Build 3D Abstractions with Wireframes DOI: http://dx.doi.org/10.5772/intechopen.96141*

ments correctly projected into space.

Truth mesh.

**51**

**3.2 How to compare the results with Ground Truth meshes**

In order to prove the validity of a 3D abstraction method, it has to be

benchmarked against a Ground Truth dataset for SfM, which includes both intrinsic and extrinsic parameters for the cameras. These are built with synthetic images from 3D models [22], or with real pictures [23] teamed with 3D model data including the pose of the cameras and the measurements from 3D scanning or Lidar. Both synthetic and real Ground-Truth datasets include a 3D model. The resulting point cloud is aligned with the Ground Truth mesh. The normal distance between the surface of the mesh and the points is computed. In order to assess how the generated sketch fits the Ground Truth model, the Mean Square Error of the distance between both spatial shapes is computed, because it acts as the natural loss function of a Gaussian distribution. In the case of 3D line sketch, in order to compare the sketch with the Ground truth mesh, the 3D straight segments must be discretized into points. To measure the difference in proportions between the generated 3D sketch and the Ground Truth mesh, the normal distance between the surface of the mesh and the discretized points on the lines is computed. Using the obtained error in the distances, discretized points on the lines are coloured to account how far they are from the surface of the mesh. There are several variables that condition the resulting 3D sketch number of images: Firstly, the number of images showing common elements of the scene is one of them. Secondly, the number of segments that can be matched between images. Thirdly, the transformation between both images might condition the matching inlier ratio, and hence, the number of seg-

For 3D line sketching methods, the length of the final 3D lines will depend on the fragmentation of the detected lines, and its number is closely related to the number of line correspondences between the images. Therefore, results of 3D reconstructions will unavoidably depend on the performance of the method for stages before the spatial projection. Quantitative measurements for 3D abstraction are performed on Ground Truth datasets. The proportions of the generated sketch is measured based on the distance between the segments and the Ground

Employing a feature-point based abstraction method is profitable for datasets with a sufficient number of pictures featuring textured surfaces, so a dense 3D point cloud can be created. For these 3D abstractions, cameras are located accurately due to the precision of the point rotation and translation invariants. This is the case of the results obtained by abstraction methods working altogether with SIFT pipelines [1, 22], but requiring dozens of high definition pictures with textured surfaces for

There are real world applications of Computer Vision that does not always permit to obtain high definition pictures, in textured environments, without blurring and digital noise. For these applications it can be advantageous to estimate the camera extrinsics independently of any feature point 3D reconstruction [2].

**Figure 3** shows a quantitative comparison of the methods [2] and [1] with just 6 and 8 images chosen from the dataset. **Figure 4** increases the number of images to 10

SIFT to be able to accurately estimate the camera extrinsics.

and 12. The test cases are labeled as *S*6, comprising image numbers
