**3.3 Data collection principles and quality requirements**

When capturing images for augmented reality, we use a large part of the image sensor. To be more precise, it's an area of 3840 2880 pixels on the iPhone 13 Pro. Then, we use a process called binning [88, 89]. It works as follows: Binning takes a region of 22 pixels, averages the pixel values, and writes back a single pixel. This has two significant advantages. First, image dimensions are reduced by a factor of two, in this case, it downscales to 1920 1440 pixels. As a result of this, each frame consumes way less memory and processing power. This allows the device to run the camera at up to 60 frames per second and frees up resources for rendering. Secondly, this process offers an advantage in low light environments, where the averaging of pixel values reduces the effects of sensor noise.

Images captured by a camera are geometrically warped by small imperfections in the lens. To project from the 2D image plane back into the 3D world, the images must be distortion corrected, or made rectilinear. Lens distortion is modeled using a onedimensional lookup table of 32-bit float values evenly distributed along a radius from the center of the distortion to a corner, with each value representing a magnification of the radius. This model assumes symmetrical lens distortion [88].

Capturing scenes with iPhone is a computer vision technology that one can leverage to easily turn images of real-world objects into detailed 3D models. We begin by

**Figure 3.** *iPhone 13 pro max used as sensor for data acquisition.*

*Fuzzy Photogrammetric Algorithm for City Built Environment Capturing into Urban Augmented… DOI: http://dx.doi.org/10.5772/intechopen.110551*

taking photos of the urban built environment from various angles with an iPhone. To photograph all the area with the ability to match landmarks between images we must move the camera around, taking photographs from different angles at different heights.

To ensure landmarks matching between overlapping photographs, camera settings must be consistent as possible from shot to shot. **Figure 5** illustrates a sample of captured data. The reading direction of the photos is indicated there: start-end.

The number of pictures need to create an accurate 3D representation varies depending on the quality of the pairs of photographs making up the sequences in the collection, the complexity and size of the built environment. In addition, adjacent shots must have substantial overlap. So, we position sequential images, so they have a 70% overlap or more (0.7 ≤ overlap ≤0.9) as illustrated in **Figure 4**. Anything less than 50% overlap between neighboring shots, and the 3D reconstruction process may fail or result in a low-quality augmented reality model [15, 52]. Doing an aperture setting narrow enough to maintain a crisp focus is recommended [53, 58]. The spatial precision between the pairs of images and the chromatic density of the textures are a guarantee of the quality of the images collected for the 3D reconstruction of built urban environments. Accordingly, key factors ensuring good quality of input data [15, 52, 53, 58, 90] are summarized in **Table 1**.

Our photographic database is made up of 800 photos taken in compliance with the overlap constraints to feed the model. The entire collection is organized into 799 image pairs. A first step consists in sorting the truly calibrated image pairs according to the constrained constraints of the stereovision image matching.

#### **3.4 Image matching in stereovision within FCM framework**

The image matching in stereovision [89, 91–94] is the process of identifying the corresponding points in two images that are cast by the same physical point in the tridimensional space. This can be carried out pixel by pixel or identifying significant features in the images, such as edges, regions or interest points.

Hence, the stereo correspondence problem can be defined in terms of finding pairs of true matches, namely, pairs of edge segments in two images that are generated by

### **Figure 5.**

*Sample of captured urban built environment dataset*


#### **Table 1.**

*Key factors affecting photogrammetric input images quality.*

the same physical edge segment in space. These true matches generally satisfy some constraints:

1.epipolar, given two segments one in the left image and a second in the right one, if we slide one of them along a horizontal direction, i.e. parallel to the epipolar line, they would intersect (overlap) (**Figure 4**);

2. similarity, matched edge segments have similar properties or attributes;


A large parallax factor value causes the background to move more slowly compared to the foreground. A small value makes the foreground and background move at a similar pace. The parallax effect becomes more apparent as the value of parallax factor increases.

According to FCM framework, causal concepts and their activation levels, the system receives as inputs a pair of stereo images left, Il and right Ir. This pair is processed to extract edge segments and their attributes; each pair of extracted features vectors (!*Il* , !*Ir* <sup>Þ</sup> is to be matched, the vectors !*Il* and !*Ir* come from Il and Ir respectively. For each pair (!*Il* ,!*Ir* Þ the attribute difference vector !*<sup>x</sup>* is computed. In this approach, a pair of edge attributes (!*Il* ,!*Ir* Þ defines a causal fuzzy concept *Ci,*. The Eq. (1) is applied and the initial activation level at the iteration t = 0 is derived from !*<sup>x</sup>* as follows in Eq. (3):

$$A\_i^0 = \frac{\mathbf{1}}{\mathbf{1} + \|\\_\downarrow\|} \tag{3}$$

where k!*<sup>x</sup>* k is defined as the Euclidean norm. This implies the application of the similarity Gestalt's principle. Hence, our FCM structure is built with as many concepts as pairs of edge attributes, from Li and Lr, are available. The algorithm is synthesized as follows in **Table 2**.

The correspondence results within the pairs for each of the characteristics are recorded in **Table 3** in number of pairs according to the number of iterations:

It is noted that from iteration n°20 the results remain stable. To test the consolidation of these, we have pushed the number of iterations to 35 without any disruption of stability.

In view of these results of this correspondence calculation phase, only the 744 pairs of photographs respecting the five constraints (epipolar, similarity, smoothness ordering and uniqueness) have been selected to now feed the scene of the augmented urban reality model.
