**2. Estimate 3D straight line segments**

A 3D line can be thought as a multi-view entity that relates a perceivable line segment in the real world to its counterparts in images, given that these have been correctly detected and matched. The process of generating a 3D representation from different pictures of the scene is visually represented in **Figure 1**.

For the SfM problem, the poses of the cameras that took the pictures are not provided, and it is up to the SfM algorithm to simultaneously estimate the poses for the cameras and primitives. In the present case, SfM has to estimate the pose of the lines in space, relative to the cameras. The first requirement for the method is the calibration matrix *K* for each camera, which provides the transformation between

**Figure 1.**

*Visual representation from [2]. It depicts the challenge of converting a set of 4 pictures into a 3D sketch featuring the line segments and camera axis. The 4 cameras are represented as three axis reference frame in red.*

each point in one image, in homogeneous coordinates, to a ray in Euclidean threedimensional space. Secondly, SfM has to estimate the projection matrices *P* for the cameras, representing a map from 3D to 2D:

$$\mathbf{x} = \mathbf{P} \mathbf{X},\tag{1}$$

employing image texture based descriptor, or in case that a low number of feature points are identified throughout the set of images, the resulting set of matched lines will not be satisfactory. On the other hand, if line matching is rooted on weak epipolar constraints [1], line matching will be highly dependent on the accuracy of

Extrinsic parameters for cameras are needed to project the matched lines into space. Having the same segment completely detected and without fragmentation for both views under viewpoint change, endpoints are the only points in a segment with known exact counterpart in the other image. Unfortunately, segment detection is not accurate in the location of the endpoints. Therefore, the most accurate abstractions will be the ones built rooted on camera extrinsics obtained from a dense feature point based SfM. As written above, known the projection matrices *P* of two cameras, a point on an image projects as a 3D ray in Euclidean space. And this 3D ray projects like an infinite 2D line on any plane different than the one that contains the point. Therefore, each 3D point *Xp* will have its image into an epipolar line *ep* contained in the image. As the unknown point is constrained into a line in the other image plane, analogously a segment will be constrained between both epipolar lines corresponding to the segment endpoints. This weak epipolar constraint can be

A *3D abstraction method* estimates the position of 3D line segments Γ = {Γ1, Γ2, Γ3, ..., Γ*N*}, from an unordered sequence of images, taking from cameras with planes

The 3D line based sketch {ϒ,Γ} is built from the knowledge of correspondences among line projections *l* on camera planes, and the intrinsics of all the cameras. The following paragraphs explain the linear triangulation of these observations, as performed from scale-space images. This allows to discriminate and weight down lines that have been detected on two or more scales with a different slope. The practical consequence is that prior to any 3D extrapolation of the observed lines, matching inliers with inconsistent endpoint location among scales on both images can be avoided, as these lines might introduce uncertainty in the estimation for the

The camera poses **P** are estimated from the endpoint correspondences of *l*. The Essential matrix **E** is computed from the camera pairs, by using the Five-Point Algorithm [12], and RANSAC [13] for hypotheses generation. Having **E** and *l*, the relative camera rotation and translation among the first pair of cameras *<sup>P</sup><sup>j</sup>* <sup>¼</sup> ½ � **<sup>R</sup>**j**<sup>t</sup>** are estimated using cheirality check and discarding the triangulated endpoints that are not in front of the cameras. The left camera is chosen to have the pose *<sup>P</sup>*<sup>1</sup> <sup>¼</sup> ½ � **<sup>I</sup>**j<sup>0</sup> , and the newly added cameras are stacked from this position in the unique reference

The forward projection of lines in 3-space is described in the page 196 of Hartley and Zisserman's book [20]. The 3D forward projection **Γ<sup>i</sup>** of a line, bundled in the same reference frame, can be obtained using the *DLT* method on the set of stereo 3D camera back-projections. This is performed in homogeneous coordinates because it allows to consider line endpoints in the infinite. Therefore, from now on, when a 2D point is mentioned it will be supposed homogeneous coordinates. There exists a 3 � 3 matrix **E**, known as the essential matrix, such that if *u* and *u*<sup>0</sup> are a pair

correspondence among them, forward projected into space, and rewritten in

,...,ϒ*<sup>M</sup>*}. Straight lines are detected in the original images, put in

employed for matching segments between images [1].

the camera poses.

*Build 3D Abstractions with Wireframes DOI: http://dx.doi.org/10.5772/intechopen.96141*

**3. Geometric relations**

homogeneous coordinates.

ϒ = {ϒ<sup>1</sup>

,ϒ<sup>2</sup> ,ϒ<sup>3</sup>

pose of the camera.

frame.

**47**

where **x** is a 2D point on the image, and **X** its projection in 3D space. **K** is intrisic to each camera, while **P** is extrinsic and embeds the 3D translation and rotation of the camera's image plane. The estimated translation is valid up to scale.

A common space can be built to host the cameras and spatial lines. For this new common space the camera that took the first processed picture takes the place of the origin, and for the rest of cameras **P** can be estimated from the lines matched between the captured images. Alternatively, camera poses can also be retrieved from a feature-point based SfM pipeline and these cameras be employed for the estimation of spatial lines. For instance, the feature-point descriptor SIFT [5] can be used to match points in images with a low ratio of outliers. These feature point relations are obtained both in the foreground and background. A set of relations between points or lines in two images allows to estimate the homography constraints between both views by applying the 5-point algorithm [12] using the points or the segment endpoints. A purge of outliers can be performed employing RANSAC [13] for robust estimation. Therefore, a set of stereo 3D projections is obtained combining the available images pairwisely, and each stereo system featuring both camera poses and a point cloud. The objective is to obtain an unique 3D point cloud sketch, embedding all cameras and point matches. Hence, camera poses are sequentially stacked, relative to each other, in the new spatial reference frame. And the 3D estimations for the feature points in the new 3D space can be computed as the center of gravity for their position relative to the common camera in both stereo systems. Finally a sparse bundle adjustment [14] is used to minimize the pixel distance of the back-projected 3D point and the original observation of this point on each image in homogeneous coordinates. These reprojection errors on the planes of the cameras are minimized employing the Levenberg-Marquardt algorithm. The resulting keypoint-based 3D reconstruction contains the optimized 3D estimations for the cameras and the point cloud.

Several straight segment matching methods are based on texture descriptors [15, 16], coloring [17] or in keypoint-line projective invariants [18, 19]. Under these conditions, matching results will be influenced by the level of texture in the images. In the case that a low number of detected segments can be distinguished by

### *Build 3D Abstractions with Wireframes DOI: http://dx.doi.org/10.5772/intechopen.96141*

employing image texture based descriptor, or in case that a low number of feature points are identified throughout the set of images, the resulting set of matched lines will not be satisfactory. On the other hand, if line matching is rooted on weak epipolar constraints [1], line matching will be highly dependent on the accuracy of the camera poses.

Extrinsic parameters for cameras are needed to project the matched lines into space. Having the same segment completely detected and without fragmentation for both views under viewpoint change, endpoints are the only points in a segment with known exact counterpart in the other image. Unfortunately, segment detection is not accurate in the location of the endpoints. Therefore, the most accurate abstractions will be the ones built rooted on camera extrinsics obtained from a dense feature point based SfM. As written above, known the projection matrices *P* of two cameras, a point on an image projects as a 3D ray in Euclidean space. And this 3D ray projects like an infinite 2D line on any plane different than the one that contains the point. Therefore, each 3D point *Xp* will have its image into an epipolar line *ep* contained in the image. As the unknown point is constrained into a line in the other image plane, analogously a segment will be constrained between both epipolar lines corresponding to the segment endpoints. This weak epipolar constraint can be employed for matching segments between images [1].
