Depth Extraction from a Single Image and Its Application

Shih-Shuo Tung and Wen-Liang Hwang

### Abstract

In this chapter, a method for the generation of depth map was presented. To generate the depth map from an image, the proposed approach involves application of a sequence of blurring and deblurring operations on a point to determine the depth of the point. The proposed method makes no assumptions with regard to the properties of the scene in resolving depth ambiguity in complex images. Since applications involving depth map manipulation can be achieved by obtaining all-infocus images through a deblurring operation and then blurring the obtained images, we have presented methods to derive all-in-focus images from our depth maps. Furthermore, 2D to 3D conversion can also be achieved from the estimated depth map. Some demonstrations show the performance and applications of the estimated depth map in this chapter.

Keywords: depth estimation, blur estimation, depth from defocus, all in focus, refocusing, defocus magnification, 2D to 3D

### 1. Introduction

Derivation of depth information from 2D images is one of the most important issues in the field of image processing and computer vision. The depth information can be applied in 2D to 3D conversion, image refocusing, scene interpretation, the reconstruction of 3D scenes, and depth-based image editing. There are some techniques used to derive depth information, such as depth from focus [1], stereo vision [2], and depth from motion [3]. Nevertheless, these techniques are complicated by the need to acquire multiple images, thereby making them impractical when only one image is available or the features corresponding between the images cannot be resolved well. To this end, a number of approaches have been proposed to acquire depth information from a single image, such as the computational photography approach [4], which modifies the shape of the aperture of a traditional lens, and the Kinect approach [5], which uses a structured light to derive depth maps.

An image captured by a conventional camera contains a blurred version of a scene that is out of focus. The blurriness of a pixel is called the "circle of confusion" (COC) and is usually modeled as a 2D Gaussian function. When a single image is taken by a conventional camera with a fixed focal length, aperture size, and distance from the image plane to the lens, a pixel's COC is related only to the depth of the corresponding scene point. In such cases, depth estimation corresponds to blur estimation. Theoretically, if the depth map of an image can be accurately estimated, applications that manipulate depths of objects can be run by first applying

deblurring followed by blurring operations. This is because the deblurring operation will move the objects closer and the blurring operation will move them farther away from the camera.

Blurring operation is more robust to depth map inaccuracy than the deblurring one, and many applications have been successfully built based on this operation. For example, defocus magnification [6] increases the out-of-focus area in an image by magnifying the existing blurriness to keep the shape of sharp regions and by modifying the depths of objects that are not in the focal plane to move the objects farther away from the plane. Deblurring operation, on the other hand, can be very sensitive to the accuracy of a depth map. A deblurring operation usually highlights the edged and textured points in an image. If their depths are overestimated, the operation can generate ringing artifacts that severely degrade the perceptual quality of an image.

The depth map estimation from an image is a fundamentally ill-posed problem. For example, in a single image, we cannot resolve the ambiguity between out-offocus edges and the original smooth edges, we cannot determine whether blurriness of a point is in the front of focal plane or behind the focal plane, and we cannot estimate the depths of points in a smooth area. These problems cannot be resolved without the assumptions between the local image features and the scene. In this context, a widely adopted assumption is that a blurred edge is obtained via smoothing a step edge with a Gaussian kernel [7]. Although this assumption has been adopted in an autofocus system of a camera, the goal of the autofocus is to derive the depths of manually selected scene points rather than the depth map of an image. The approach that is based on the assumption on scene points has also been used in estimating a depth map. Edges in a scene are first modeled, and the depths of the blurred edge points are then derived from the degree of blurriness that has been applied on the scene to obtain the points. However, because many types of

Figure 1. Applications from the depth map of a single image.

Depth Extraction from a Single Image and Its Application DOI: http://dx.doi.org/10.5772/intechopen.84247

singularities far beyond the step edges in the scene can appear in an image, the approach based on the scene modeling can be too restricted, as only a few types of singularities can be modeled, to derive precise depths of all points. As a result, the depth precision derived based on scene modeling is usually limited to images with two depth layers, foreground and background.

In this chapter, we propose a blurring-deblurring method that does not require the modeling of edge points. In the blurring process, a point is blurred by increasing its COC to the limit of a camera. However, in the deblurring process, a point is deblurred by reducing its COC to the limit in the other end. Combine the results of these two processes and derive the depths of edges. Therefore, the approach estimates the depths of edge points based on the characteristic curve of COC vs. the depth characteristic curve of a camera. We demonstrated that the proposed approaches can reliably derive depth maps of complex images and synthesize all-infocus images. Furthermore, the depth maps can also be applied to synthesize the stereo image to 3D visualization through the mobile device. Figure 1 shows a diagram of applications from the depth map of a single image.

The remainder of this chapter is organized as the following. The relationship between the depth of a point and out-of-focus blurriness in images obtained by the thin-lens camera model is reviewed in Section 2. The proposed blurring-deblurring approach is presented in Section 3. The depth refinement approach and image deblurring process are presented in Section 4. In Section 5, we demonstrated the depth map results and applications. Section 6 contains some conclusions.

#### 2. Camera model and out-of-focus blurriness

The out-of-focus blurriness is defined by the COC if an object is not in the camera's focal plane. However, it is impossible to determine whether an object is behind or in front of the focal plane based on the blurriness of an object [8]. In the following, we consider a case of the condition.

In a thin-lens model,

$$\frac{1}{u} + \frac{1}{v} = \frac{1}{f'}\tag{1}$$

' where u is the distance between the lens and the scene point, v is the distance between the focal plane and the lens, and f is the focal length of the lens. If a light point is not in the focal plane but placed in front of the camera, the source s image will be a circular disk with diameter DCOC instead of a point, as shown in Figure 2. Let d be the distance between the lens and the image sensor. Then, the in-focus scene distance uin�focus can be derived as follows:

$$\frac{1}{u\_{in-focus}} + \frac{1}{v} = \frac{1}{f}.\tag{2}$$

' For a particular lens, the focal length f and the aperture A are constants; the Fnumber N= <sup>f</sup> is also <sup>a</sup> constant. Given the geometric relationship DCOCð Þ <sup>u</sup> shown in <sup>A</sup> Figure 2 and the lens formula, the COCs diameter of a scene point at distance u from the lens depends on whether u . uin�focus (the scene point is farther from the lens than the focal plane) or u , uin�focus (the scene point is closer to the lens than the focal plane).

In the case where u . uin�focus, we can derive the following relationship from the similar triangles shown in Figure 2(a):

Figure 2.

The geometry of imaging: u is the distance of a scene point from the lens, uin�focus is the distance of the focal plane from the lens, d is the distance between the lens and the image sensor, and the diameter of the lens' aperture is A. (a) u . uin�focus and (b) u , uin�focus:

$$\frac{D\_{\rm{COC}}(u)}{A} = \frac{d-v}{v} \,. \tag{3}$$

<sup>f</sup> Using Eq. (1) and <sup>N</sup> <sup>¼</sup> <sup>A</sup> , we obtain

$$D\_{\rm{COC}}(\mathfrak{u}) = \left(-\mathbf{1} + \frac{d}{f} - \frac{d}{\mathfrak{u}}\right) \frac{f}{N}. \tag{4}$$

In the case where u , uin�focus, we can derive the following relationship from similar triangles shown in Figure 2(b):

$$\frac{D\_{\rm{COC}}(u)}{A} = \frac{v-d}{v} \,. \tag{5}$$

<sup>f</sup> Using Eq. (1) and <sup>N</sup> <sup>¼</sup> <sup>A</sup> , we obtain

$$D\_{\rm{COC}}(u) = \left(1 - \frac{d}{f} + \frac{d}{u}\right) \frac{f}{N}. \tag{6}$$

From Eqs. (4) and (6), we can derive DCOC of a scene point; however, the equations do not allow us to determine whether the scene point is in front of or behind the focal plane. To remove the ambiguity, the assumption that all the scene points are behind the focal plane is adopted.

An image is usually modeled as the convolution of the scene and a camerarelative PSF. The Pillbox function is an ideal PSF, which is a box function with support σ and constant value <sup>σ</sup> 1 . Usually, the Gaussian function is the approximation of the Pillbox function, and the standard derivation of the Gaussian function is <sup>p</sup><sup>σ</sup>ffiffi : Due to the factor, the difference between their frequency domain magnitudes is small, and the latter is easier to do analysis. In this chapter, we will use the Gaussian function to characterize the PSF of a camera. 2

#### 3. Blurring and deblurring approach

Using the proposed approach, scene depths will be determined from the estimated blurriness in a single image by using a combination of blurring and deblurring processes. In this section, we will explain the rationale for combining the blurring and deblurring processes and provide the formulation of the combined approach. The depth of a scene point is defined as the distance between the camera lens and the point. In addition, the proposed method assumes that all the interested scene points are behind the focal plane (it matches to the case: u . uin�focus) as [9].

#### 3.1 Concept

The (DCOC vs. u) curve of Eq. (4), illustrated in Figure 3(a), gives the relationship between the depth of a scene point and its DCOC value of a camera. The latter increases with the depth of the scene point. When DCOC reaches its limit (D<sup>∗</sup> COC), the point can be assumed to be at infinity.

Let DCOC be the blurriness of a point. A blurring operator can be defined to add an increment of blurriness to the point to obtain a new blurriness DCOC þ δðDCOCÞ, with δðDCOCÞ . 0. This can be regarded as increasing the depth of the point by moving it along the (DCOC vs. u) curve toward the right end point of the figure. If the blurring operation is applied repeatedly, the blurriness can reach D<sup>∗</sup> COC and the point is at infinity.

If the increment in the blurriness of a point to reach D<sup>∗</sup> COC can be determined, we can convert this increment into the increment in depth by referring to the curve (DCOC vs. u) in Figure 3(a) and derive the true depth of the point. However, as <sup>∂</sup><sup>u</sup> shown in the curve of Figure 3(b), <sup>a</sup> small increment in blurriness close to <sup>∂</sup>DCOC D<sup>∗</sup> COC yields a substantially large increment of depth. This means that the depth determination close to D<sup>∗</sup> COC is relatively unstable and inaccurate.

On the other hand, the deblurring operator is defined to reduce the blurriness of the point to obtain DCOC � δðDCOCÞ. If a deblurring operation is repeatedly applied to a point, the latter will become sharper. The deblurring process gradually reduces the depth of the point by moving it along the (DCOC vs. u) curve toward the left end point, corresponding to move the point to the focal plane or be in focus. If the decrement in blurriness, by making a point in-focus, can be determined, we can convert the decrement to the decrement in depth of the point to the focal plane. Then, we refer to the curve (DCOC vs. u) in Figure 3(a) to acquire the true depth of <sup>∂</sup><sup>u</sup> the point. However, as shown in the curve of Figure 3(b), <sup>a</sup> small decrement <sup>∂</sup>DCOC

#### Figure 3.

The relation between the out-of-focus blurriness and the distance u in Eq. (4) in the pixel domain, where uin�focus <sup>=</sup> <sup>1000</sup> mm, <sup>f</sup> <sup>=</sup> <sup>50</sup> mm, <sup>N</sup> <sup>=</sup> 5.6, <sup>d</sup> <sup>=</sup> 52.6316 mm, 0.0061 mm per pixel, and <sup>D</sup><sup>∗</sup> COC is 38.5184. (a) Plot of DCOC versus u, the depth. The limit of DCOC is denoted by D<sup>∗</sup> COC. In the blurring process, the dotted point is moved along curve A. In the deblurring process, the point is moved along curve B. (b) The derivation of curve (a). When u approaches , the blurring process fails to estimate the depth. When u is close to the focal plane, uin�focused, the deblurring process fails in the area.

in depth close to the focal plane can yield a substantial decrement in DCOC, which means that DCOC cannot be reliably and accurately obtained when the point moves closer to the focal plane by a deblurring operation.

Since the depth estimation at large DCOC and the DCOC estimation of a point close to focal plane are unreliable, we were motivated to propose the blurring and deblurring approach that combines the differential blurring and deblurring operations to yield a more robust depth estimation of a scene point than only using one of them.

#### 3.2 Formulation

Let u<sup>0</sup> be the true depth of the point at x; and let Fbðx; u � u<sup>0</sup> þ u∞Þ and ˜ ° Fd <sup>x</sup>; <sup>u</sup> � <sup>u</sup><sup>0</sup> <sup>þ</sup> uin�focused be the blurring measurement and deblurring measurement, respectively. The blurring measurement measures whether a point is blurred to D<sup>∗</sup> COC, and the deblurring measurement measures whether that point is deblurred to be in-focus. We define that Fbðx; u � u<sup>0</sup> þ u∞Þ is a proper function with a (local) ˜ ° minimum near <sup>u</sup><sup>∞</sup> and Fd <sup>x</sup>; <sup>u</sup> � <sup>u</sup><sup>0</sup> <sup>þ</sup> uin�focused is <sup>a</sup> proper function with <sup>a</sup> (local) minimum near uin�focused. The following formula is used to determine the true depth u<sup>0</sup> of the point x:

$$\min\_{u} \lambda F\_b(\mathbf{x}, u - u\_0 + u\_\infty) + F\_d(\mathbf{x}, u - u\_0 + u\_{\text{in-focus}}),\tag{7}$$

with the constraint that

$$
\Delta D\_{\rm COC}^{b}(\mu) + \Delta D\_{\rm COC}^{d}(\mu) = D\_{\rm COC}^{\*} - D\_{\rm COC} \left( \mu\_{\rm in-facused} \right), \tag{8}
$$

˜ ° where <sup>D</sup><sup>∗</sup> is <sup>a</sup> camera-dependent constant and <sup>λ</sup> is the COC � DCOC uin�focused Lagrangian parameter that balances the blurriness and deblurriness measurements, and

$$
\Delta D\_{\rm COC}^b(\mathfrak{u}) = D\_{\rm COC}^\* - D\_{\rm COC}(\mathfrak{u}) \tag{9}
$$

is the increment of blurriness to D<sup>∗</sup> COC and

$$
\Delta D\_{\rm COC}^d(\mathfrak{u}) = D\_{\rm COC}(\mathfrak{u}) - D\_{\rm COC} \left( \mathfrak{u}\_{\rm in-focused} \right) \tag{10}
$$

denotes the decrement of the DCOC assuming the point at depth u to the focal plane. The constraint in Eq. (8) is necessary because it indicates that the sum of the added blurriness from the current guess u to D<sup>∗</sup> COC and the reduced blurriness from u to ˜ ° ˜ ° DCOC uin�focused is <sup>a</sup> constant, <sup>D</sup><sup>∗</sup> uin�focused : COC � DCOC

#### 3.3 Blurring and deblurring measurements

For a simplified analysis but without any loss of generality, the following derivations were based on one-dimensional signals and neglecting the boundary conditions.

#### 3.3.1. Blurring measurement

The objective of the blurring process is to determine the amount of blurriness required for a point to reach D<sup>∗</sup> COC. When edged or textured patches are gradually placed at far distance, the details of the patches become faint, their variances

decrease, and only their mean brightness can be derived at infinity. Thus, the variance of a patch can be used as the blurriness measurement. Specifically, when a patch is blurred to reach D<sup>∗</sup> COC, its variance can be assumed to be 0.

Let the true depth of the scene point x be u0, and let the image of the point be

$$\int f\_0(\mathbf{x}) = \mathbf{g}\left(\sigma(\boldsymbol{u}\_0)^2\right) \* \boldsymbol{s}(\boldsymbol{x}) \tag{11}$$

where g is the Gaussian function and σðu0Þ <sup>2</sup> is the variance of g at depth u0. We define the blurriness measurement as follows:

$$F\_b(\mathbf{x}, u - u\_0 + u\_\infty) = \left\| \left\| \mathbf{g} \left( \sigma(u\_\infty)^2 - \sigma(u)^2 \right) \* f\_0(u) - E \left\{ f\_0(\mathbf{x}) \right\} \right\|^2,\tag{12}$$

˛ ˝ where E f <sup>0</sup>ð Þ <sup>x</sup> is the mean on <sup>a</sup> neighborhood of <sup>f</sup> <sup>0</sup>ð Þ <sup>x</sup> and <sup>σ</sup>ðu∞Þ ¼ <sup>D</sup><sup>∗</sup> COC. 2 2 Assuming that σ<sup>2</sup> <sup>b</sup> ¼ σðu∞Þ � σð Þ u , we obtain from Eqs. (11) and (12)

$$\left\| \left\| \mathbf{g} \left( \sigma\_b^2 \right) \* f\_0(\mathbf{x}) - E \left\{ f\_0(\mathbf{x}) \right\} \right\| \right\|^2 = \left\| \left\| \mathbf{g} \left( \sigma\_b^2 + \sigma(u\_0)^2 \right) \* s(\mathbf{x}) - E \left\{ f\_0(\mathbf{x}) \right\} \right\| \right\|^2. \tag{13}$$

The above equation is derived by using the fact that the convolution of two Gaussians of variances σ<sup>2</sup> <sup>1</sup> and σ<sup>2</sup> <sup>2</sup> is a Gaussian of variance σ<sup>2</sup> <sup>1</sup> þ σ<sup>2</sup> 2. If u is equal to u0, <sup>2</sup> then <sup>σ</sup><sup>2</sup> ð Þ <sup>¼</sup> <sup>σ</sup>ðu∞<sup>Þ</sup> <sup>2</sup> and it becomes <sup>b</sup> <sup>þ</sup> <sup>σ</sup> <sup>u</sup><sup>0</sup>

$$\left\| \left\| \left\| \left( \sigma(u\_{\infty})^2 \right) \* \mathfrak{s}(\infty) - E \left\{ f\_0(\mathfrak{x}) \right\} \right\| \right\|^2 \approx \mathbf{0},\tag{14}$$

˜ ° <sup>σ</sup>ðu<sup>∞</sup> <sup>2</sup> since <sup>g</sup> <sup>Þ</sup> <sup>∗</sup> s xð Þ can be approximated as the mean of <sup>f</sup> <sup>0</sup>ð Þ <sup>x</sup> . Thus, Fbðx; u � u<sup>0</sup> þ u∞Þ reaches a local minimum when u is equal to u0.

#### 3.3.2 Deblurring measurement using blurring-deblurring operator

In contrast to blurring, deblurring is extremely unstable, and it usually assumes some prior knowledge of the scene so that the high-frequency (edge and texture) information can be recovered. Because of the prior assumption, when the estimated depth is overestimated, ringing artifact occurs in the result of the deblurring process.

˜ ° ˜ ° We <sup>2</sup> <sup>1</sup> <sup>2</sup> denote <sup>g</sup> <sup>σ</sup>ð Þ <sup>u</sup> <sup>∗</sup> <sup>g</sup>� <sup>σ</sup>ð Þ <sup>u</sup> as the blurring-deblurring operator, where ˜ ° �<sup>1</sup> <sup>2</sup> <sup>2</sup> g σð Þ u is the reduction in blurriness of a Gaussian kernel with variance σð Þ u . An image is first deblurred and then blurred by the same Gaussian kernel with <sup>2</sup> variance <sup>σ</sup>ð Þ <sup>u</sup> . <sup>A</sup> deblurring process will tend to over-enhance the high-frequency information in the image if u is an overestimated depth. As shown by the subfigures in the second row of Figure 4(c), it causes severe artifacts. So, the measurement of the error from the blurring-deblurring operator of a given point is proposed in the following:

$$S(\mathbf{x}, \boldsymbol{\mu}) = \left\| \boldsymbol{f}\_0(\mathbf{x}) - \mathbf{g}\left(\sigma(\boldsymbol{\mu})^2\right) \* \mathbf{g}^{-1}\left(\sigma(\boldsymbol{\mu})^2\right) \* \boldsymbol{f}\_0(\mathbf{x}) \right\|^2. \tag{15}$$

As shown in Figure 4(d), when the estimation of the blurring scale is over the true scale (the scale is 4), the result of S xð ; uÞ increases dramatically because of the artifacts in the neighborhood of the edge points.

Figure 4.

(a) The patch was taken from the eye of the "Lena" image, (b) the blurred patch of (a) with blurring scale 4, (c) the candidate scene patches obtained by deblurring the patch in (b) with a TV-based method (section 4.3) ˜ ° and different blurring scales whose standard deviations ranged from 1 to 8, and (d) the curve of S x; uj and ˜ ° (e) K x; uj , with <sup>j</sup> <sup>=</sup> 1, ���, 8.

From Figure 4(d), S xð ; uÞ is asymmetric with respect to over- and underestimation of u0, where u<sup>0</sup> is the true blurring scale or true depth. To capture the transition point from small to large values of S xð ; uÞ, we calculated the curvature at uj of the smooth curve as follows:

$$K(\mathbf{x}, u\_j) = \frac{\mathbb{S}(\mathbf{x}, u\_{j+1}) - 2\mathbb{S}(\mathbf{x}, u\_j) + \mathbb{S}(\mathbf{x}, u\_{j-1})}{\left(\mathbb{1} + \left(\mathbb{S}(\mathbf{x}, u\_{j+1}) - \mathbb{S}(\mathbf{x}, u\_j)\right)^2\right)^{1.5}},\tag{16}$$

as shown in Figure 4(e). The larger the value of the curvature, higher is the probability that it is the pivot point for the transition. Thus, we define the deblurring measurement as follows:

$$F\_d(\mathbf{x}, \boldsymbol{\mu} - \boldsymbol{\mu}\_0 + \boldsymbol{\mu}\_{\text{in-focused}}) = -K(\boldsymbol{\kappa}, \boldsymbol{\mu}).\tag{17}$$

˜ ° The measure Fd <sup>x</sup>; <sup>u</sup> � <sup>u</sup><sup>0</sup> <sup>þ</sup> uin�focused has <sup>a</sup> local minimum at the transition of S xð ; uÞ. Thus, when u is equal to u0, Fd x; u � u<sup>0</sup> þ uin�focused becomes the minimum. ˜ °

#### 3.3.3 Depth estimation

The blurring and deblurring measurements, Fb and Fd, defined in Eqs. (12) and (17), respectively, can be substituted in the objective function in Eq. (7) to obtain

$$H(u) = \lambda \left\| \left\| \left( \sigma(u\_{\infty})^2 - \sigma(u)^2 \right) \* f\_0(u) - E \left\{ f\_0(\mathbf{x}) \right\} \right\| \right\|^2 - K(\mathbf{x}, u). \tag{18}$$

The blurring and deblurring approach can now be used to derive the solution for

$$\min\_{u} H(u),\tag{19}$$

based on the constraint in Eq. (8). The complexity of the problem relies on how precise the depth is measured for each point. Although depth is an important cue, it seems that the relative depths, such as which object is in foreground and which is in background, are more important than the accurate depths. The blurring and deblurring approach is a point-wise optimization method. We used a method by deriving the best solution from the d candidate depths in a sequence, u1, ⋯ud, to save the computational cost. To the optimization problem, the solution is to pick a candidate depth that has the minimization. The candidate depths were chosen so that σðuiÞ � σðui�<sup>1</sup> O γN<sup>2</sup> d ˜ ° Þ ¼ 0:2 for i = 2, ⋯, d in implementation. This procedure takes , where γN<sup>2</sup> is the ratio of edged and textured pixels in an image of N<sup>2</sup> pixels, point-wise blurring and deblurring operations to determine the depth map of the image.

#### 4. Depth refinement and image deblurring

The blurring and deblurring measurements are only able to determine the depths of edge and texture points. However, if blurring and deblurring with Gaussian kernel of any variance are applied to a sufficiently large patch of constant value, it will yield a patch of constant value. Therefore, the proposed approach cannot reliably determine the depths of points in smooth regions. Thus, we resort to another approach to derive the depths of smooth scene points.

On the other hand, all in focus is to generate an image that is focused everywhere or to transfer the depth map of an image to a depth map in focal plane. Therefore, an all-in-focus image can be generated by the deblurring process. Practically, since a deblurring process is very sensitive to overestimated depth, if there are overestimated depths in images, the deblurring process can hamper the all-in-focus result and render a visually unacceptable image.

This problem cannot be trivially solved by subtracting a depth from all the points because the value to subtract is not easy to determine. This value should be large enough to stabilize the deblurring process and at the same time small enough so that the depths are not underestimated too much, rendering a blurred all-in-focus image. We used two methods, viz., depth quantization and TV deblurring process, to rectify the effects caused from the depth estimation error.

#### 4.1 Depths of smooth scene points

We followed [6] to estimate the depth at edges and texture followed by propagation of the results to other points. In our method, we use Canny edge detector [10] to decide whether a point is an edged or textured point. The depths of these points (called Canny points) were then estimated from the blurring and deblurring approach. For convenience, we called the remaining points as the smooth points.

The propagation algorithm to derive the depths of smooth points was based on the solution of the Dirichlet problem [11], which addresses the temperature distribution from the boundary to the interior of a medium. The solution of the Dirichlet problem is based on two principles: the maximum principle and the uniqueness principle. The maximum principle states that the interior temperature lies between the maximum boundary temperature and the minimum boundary temperature, and the uniqueness principle states that the solution of the problem is unique. In our approach, we regarded the temperature as the depth and defined the boundary points as the union of the smooth points at the border and the non-smooth points of an image. The depth was first assigned to the smooth points at the border of the image. Then, we used the solution of the Dirichlet problem to derive the depth of the smooth points inside the image. By this approach, the depths of the smooth points were never larger than those of the enclosing points.

The steps of the depth propagation procedure are as follows. First, we normalized the depths of the Canny points by setting the depth of the point farthest from the camera as 1. Then, we assigned depths to the smooth points on the borders of the image. Because the top border of an image is usually the background, the depth of the smooth points on that border was assigned the value 1. For a smooth point on the left-hand, right-hand, and bottom borders, we assigned the depth of the closest non-smooth point. Figure 5 demonstrates an example how the depths are propagated to an image.

#### 4.2 Depth quantization

For the quantization process, a depth can be approximated by a layer. The motivation of the idea was from the scalar quantization in compression. Via the quantization process, a coefficient can be approximated. In the process, the layer L is a parameter. From the results in Section 3.3.3, the histogram of the depths was calculated firstly. The representative (anchor) depth, a1, for layer 1 was always assigned as the minimum depth of all the depths. When the depths are partitioned into two layers, we proposed the following optimization to determine the anchor depth of layer 2, a2:

$$\min\_{\mathbf{a}\_2} \sum\_{x\_i \in layer} \left( z\_i - a\_1 \right)^2 + \sum\_{x\_i \in layer} \left( z\_i - a\_2 \right)^2,\tag{20}$$

where a<sup>2</sup> is subject to a<sup>2</sup> . zi . a1, for each zi in layer 1, and zi ≥ a2, for each zi in layer 2. By recursively subdividing a depth layer to acquire two more depth layers, the procedure can be applied to acquire L layers. The depths in the layer are updated to be the depth of the anchor if the anchor depth of a layer is determined. For instance, the depth zi in layer j is assigned as aj. Hence, zi � aj for zi in layer j is the error in approximating zi with aj. In the approximation, the anchor depth can be found from Eq. (20), which has the minimization of the error.

By sampling a few depths as candidate anchor depths firstly, the anchor depth a<sup>2</sup> in Eq. (20) can be derived. Next, we set each candidate as a<sup>2</sup> to calculate the average error from Eq. (20). With the help of the histogram of the depths, this process can

### Depth Extraction from a Single Image and Its Application DOI: http://dx.doi.org/10.5772/intechopen.84247


#### Figure 5.

An example of depth propagation to the interior smooth patches achieved by solving the Dirichlet problem. (top left) The depth map of non-smooth patches (the depth of patches farthest from the camera was set at 1), (top right) the depths assigned to the border patches, and (bottom) the depths of the interior smooth patches derived by solving the Dirichlet problem.

#### Figure 6.

(a) The image composed of four depth layers. (b) Depth map derived by multi-scale blurring and deblurring approach. The top two layers of depths can hardly be distinguished. (c) The depth map is quantized to four layers. (d) The depth map is quantized to three layers.

be efficiently achieved. So, the anchor depth is the depth in the candidate depths that yields the smallest average error. Figure 6 shows the estimated depth map and the depth quantization results on an image, composed of four layers of depths. After quantization, some outliers in Figure 6(b) are removed.

#### 4.3 Deblurring process

Patch y can be modeled as gð Þ σ ∗ x where x and y are vectors, x is the vector of the scene patch X (x = vec(X)), and σ is the out-of-focus blurriness of x in y. The deblurring process restores x by

$$\min\_{\mathbf{x}} \frac{\mu}{2} \left\| \mathbf{y} - \mathbf{g}(\sigma) \ast \mathbf{x} \right\|^2 + \left\| D\_1 \mathbf{X} \right\|\_{\mathbf{1}} + \left\| D\_2 \mathbf{X} \right\|\_{\mathbf{1}} \tag{21}$$

where μ is a Lagrangian multiplier and kD1Xk<sup>1</sup> and kD2Xk<sup>1</sup> denote the discrete total variation of X in horizontal and vertical directions, respectively. The gð Þ σ ∗ x can be represented as a matrix–vector multiplication. Eq. (21) can be solved by an efficient variable splitting technique, as described in Ref. [12].
