2. Kinect cane system

devices cameras [12–16], ultrasonic sensors [17–29], stereoscopic cameras [30–41], or RGB-D cameras [42–46]. These assistive systems are built on the basis of the concept of the electronic travel aid (ETA) [47, 48], which aims to assist visually impaired users in walking while avoiding obstacles. Therefore, these systems can notify the users about obstacles but cannot

Here, let us consider a situation where there is a seat (bench) in front of a visually impaired user as shown in Figure 1, and the user wants to sit on the seat to take a rest. In this situation, the seat is not just any obstacle, but a useful equipment. If the user uses one of the obstacle detection systems mentioned earlier (see Figure 1(a)), he or she is required to confirm the obstacle by himself or herself. However, if the user uses an object recognition system, which can determine the object to be a seat (Figure 1(b)), the user can obtain a benefit. It is necessary to build an assistive system to recognize objects around a

Several research groups have proposed object recognition systems. Drug packages [49], classroom doors [50–53], podiums [50], and pathways [54, 55] are recognized by using barcodes [49, 56], augmented reality markers [50, 52–54], radio frequency identification tags [23, 51, 57, 58], Bluetooth devices [59], wireless network devices [55], or visible light communication devices [60, 61]. These physical devices are useful, but it is difficult to deploy such devices in everyday

Other research groups have also proposed assistive systems to notify visually impaired users about tables [62], color blocks [63], and staircases [64–67] by means of laser range sensors [65– 67] and Kinect sensors [62, 63]. These systems are useful, but are not sufficient yet. Other types

This chapter proposes our assistive systems not only to detect obstacles of various sizes but also to recognize objects of various types by use of image processing technique.

of object should be recognized to help visually impaired individuals more.

Figure 1. Obstacle detection system (a) versus object recognition system for the visually impaired (b).

tell them what kind of objects they are.

110 Causes and Coping with Visual Impairment and Blindness

visually impaired user.

environment.

Figure 2 shows our Kinect cane system composed of a white cane and a backpack [68]. A Kinect sensor, a numeric keypad, and a tactile device are attached to the white cane. Kinect is an infrared-based range sensor for a consumer game machine, that is, Microsoft Xbox, and the white cane is also a commercial product for the visually impaired. (The sensor and the cane are approximately 300 and 100 USD, respectively.) The Kinect sensor is set at 75 cm from the floor. These devices are connected by wires with a portable personal computer and a UPS battery in the backpack. The computer and the UPS battery are used for device controlling and power supply, respectively.

In this system, the X, Y, and Z axes of the world coordinate system are defined to be the horizontal vector oriented from left to right, the vertical vector oriented from top to bottom, and the horizontal vector extending from a Kinect sensor into the environment, respectively.

The Kinect cane system can detect obstacles and recognize several objects, such as seats, by use of our special computer programs implemented for the following methods.

#### 2.1. Obstacle detection

The Kinect cane system can detect obstacles which would prevent a visually impaired user to walk safely [45]. We provide two detection methods of small and large obstacles considering

Figure 2. Our Kinect cane system.

the property of a Kinect sensor. These two methods are simultaneously executed in the obstacle detection mode of the Kinect cane system, and if one of these methods detects obstacles, the tactile device returns vibration feedback to the user.

#### 2.1.1. Detection of small obstacles

Figure 3(a) shows an image of a corridor scene including a small obstacle (i.e., a box with a height of 7 cm) on a floor. Figure 3(b) shows the depth data of the corridor scene. The scene

image and depth data were obtained by the Kinect sensor. In the depth data, the distances from the Kinect sensor to points on object surfaces are coded. Black pixels represent positions where the sensor cannot measure the distances. The colored pixels (except black ones) are

In order to detect a small obstacle on a floor, the method uses a measurement line, which is a vertical line at the center of depth data as shown in Figure 3(b). The measurement line is projected onto the floor plane, and the projected line is called a floor line. Figure 3(c) shows the profile of the depth data along the floor line. Black dots represent edges on the Z-Y plane.

where a and b are coefficients. Let (zi, yi) denote the coordinate of the i-th edge (i = {1, 2, ⋯, I}). The line is fitted to the edges by minimizing the following sum of the squared distances:

> <sup>S</sup> <sup>¼</sup> <sup>X</sup> I

i¼1 ð Þ di 2

di ¼ azi þ b � yi

wizi

P

i wiyi

<sup>y</sup> <sup>¼</sup> <sup>a</sup><sup>∗</sup><sup>z</sup> <sup>þ</sup> <sup>b</sup><sup>∗</sup> (6)

� � Pwiziy

The optimal values of a and b are obtained by using a robust estimation [69] as follows:

wizi wi

wi <sup>¼</sup> di <sup>1</sup> <sup>þ</sup> <sup>1</sup> 2 d2 i

represents the optimal floor line. The edges above the optimal floor line are determined to be

The detectable range of a Kinect sensor is from approximately 40 to 600 cm. In many cases, the upper limitation, 600 cm, is sufficient for obstacle detection. However, the lower limitation, 40 cm, may cause problems that visually impaired individuals collide with obstacles in front of

Pwiz<sup>2</sup> i P

P

a∗ b∗ � �

where weight values are defined as

2.1.2. Detection of large obstacles

¼

y ¼ az þ b, (1)

Assistive Systems for the Visually Impaired Based on Image Processing

http://dx.doi.org/10.5772/intechopen.70679

113

, (2)

: (3)

� �, (4)

: (5)

called edges in this chapter.

The floor line is formulated by

where

The equation

the obstacle.

(a) (b)

Figure 3. A small obstacle on a floor in a corridor scene. (a) Color image; (b) depth data; (c) profile.

image and depth data were obtained by the Kinect sensor. In the depth data, the distances from the Kinect sensor to points on object surfaces are coded. Black pixels represent positions where the sensor cannot measure the distances. The colored pixels (except black ones) are called edges in this chapter.

In order to detect a small obstacle on a floor, the method uses a measurement line, which is a vertical line at the center of depth data as shown in Figure 3(b). The measurement line is projected onto the floor plane, and the projected line is called a floor line. Figure 3(c) shows the profile of the depth data along the floor line. Black dots represent edges on the Z-Y plane.

The floor line is formulated by

$$y = az + b,\tag{1}$$

where a and b are coefficients. Let (zi, yi) denote the coordinate of the i-th edge (i = {1, 2, ⋯, I}). The line is fitted to the edges by minimizing the following sum of the squared distances:

$$S = \sum\_{i=1}^{I} \left(d\_i\right)^2,\tag{2}$$

where

the property of a Kinect sensor. These two methods are simultaneously executed in the obstacle detection mode of the Kinect cane system, and if one of these methods detects obstacles, the

Figure 3(a) shows an image of a corridor scene including a small obstacle (i.e., a box with a height of 7 cm) on a floor. Figure 3(b) shows the depth data of the corridor scene. The scene

(a) (b)

(c)

Figure 3. A small obstacle on a floor in a corridor scene. (a) Color image; (b) depth data; (c) profile.

tactile device returns vibration feedback to the user.

112 Causes and Coping with Visual Impairment and Blindness

2.1.1. Detection of small obstacles

$$d\_i = az\_i + b - y\_i. \tag{3}$$

The optimal values of a and b are obtained by using a robust estimation [69] as follows:

$$
\begin{pmatrix} a^\* \\ b^\* \end{pmatrix} = \begin{pmatrix} \sum w\_i z\_i^2 & \sum w\_i z\_i \\ \sum w\_i z\_i & w\_i \end{pmatrix} \begin{pmatrix} \sum w\_i z\_i y\_i \\ \sum w\_i y\_i \end{pmatrix} \tag{4}
$$

where weight values are defined as

$$w\_i = \frac{d\_i}{1 + \frac{1}{2}d\_i^2}.\tag{5}$$

The equation

$$y = a^\*z + b^\* \tag{6}$$

represents the optimal floor line. The edges above the optimal floor line are determined to be the obstacle.

#### 2.1.2. Detection of large obstacles

The detectable range of a Kinect sensor is from approximately 40 to 600 cm. In many cases, the upper limitation, 600 cm, is sufficient for obstacle detection. However, the lower limitation, 40 cm, may cause problems that visually impaired individuals collide with obstacles in front of them. In this section, we propose a detection method of large obstacles not only in the detectable ranges but also nearer than the lower limitation.

In the method, three small circular windows, called obstacle measurement (OM) spots, are set on depth data as shown in Figure 4. They are arranged horizontally with a certain interval. The positions and interval are determined considering the height and width of the body of a user. Another small circular window, called a floor measurement (FM) spot, is set on the bottom area. The OM and FM spots are represented as SO1, SO2, SO3, and SF, respectively.

If each measurement spot includes enough number of edges, the spot is defined to be detected, and the mean depth value is calculated from the depth values of the edges in the detected spot.

The system determines the distance between a Kinect sensor and a large obstacle on the basis of their relation as follows (see Figure 5):

Case 1: If the obstacle is nearer than 40 cm, all the spots would not be detected. In this case, the system determines the distance to be less than 40 cm.

Case 2: If the obstacle is between 40 and 600 cm, at least one of the OM spots would be detected, and the FM spot would be detected as well. The system outputs the minimum value among the mean depth values of the detected OM spots.

Case 3: If the obstacle is further than 600 cm, all the OM spots would not be detected, whereas the FM spot would be detected. The system determines the distance to be more than 600 cm.

Assistive Systems for the Visually Impaired Based on Image Processing

http://dx.doi.org/10.5772/intechopen.70679

115

The Kinect cane system can recognize several objects from depth data. The recognition methods

Artificial environments are generally composed of many planes, such as floors and walls, and therefore, planes can be effective clues to recognize the environments. Figure 6(a) and (b) shows an example scene and its depth data, respectively. Planes are recognized [70] by using

1. Three edges are randomly chosen from edges in depth data, and then a plane is fit to the chosen edges by use of the least-square method. The three pink points in Figure 6(c) and (d) are the randomly-chosen edges, and the blue regions in Figure 6(e) and (f) are the

2. The method determines edges whose distances to the plane are nearer than a threshold.

4. The plane with the most inliers is selected. Figure 6(g) and (h) shows the selected plane,

the following method based on random sample consensus (RANSAC) algorithm [71]:

The pillar was successfully detected in Figure 4.

Figure 5. Relation between a Kinect sensor and a large obstacle.

2.2. Object recognition

2.2.1. Planes

planes.

and results are described below.

These edges are called inliers.

3. Steps (1) and (2) are iterated a certain number of times.

which corresponds to the floor. The inlier edges are eliminated.

Figure 4. A large obstacle (pillar) in a building. (a) Color image; (b) depth data.

Figure 5. Relation between a Kinect sensor and a large obstacle.

Case 3: If the obstacle is further than 600 cm, all the OM spots would not be detected, whereas the FM spot would be detected. The system determines the distance to be more than 600 cm.

The pillar was successfully detected in Figure 4.

#### 2.2. Object recognition

The Kinect cane system can recognize several objects from depth data. The recognition methods and results are described below.

#### 2.2.1. Planes

them. In this section, we propose a detection method of large obstacles not only in the

In the method, three small circular windows, called obstacle measurement (OM) spots, are set on depth data as shown in Figure 4. They are arranged horizontally with a certain interval. The positions and interval are determined considering the height and width of the body of a user. Another small circular window, called a floor measurement (FM) spot, is set on the bottom area. The OM and FM spots are represented as SO1, SO2, SO3, and SF,

If each measurement spot includes enough number of edges, the spot is defined to be detected, and the mean depth value is calculated from the depth values of the edges in the detected spot. The system determines the distance between a Kinect sensor and a large obstacle on the basis

Case 1: If the obstacle is nearer than 40 cm, all the spots would not be detected. In this case, the

Case 2: If the obstacle is between 40 and 600 cm, at least one of the OM spots would be detected, and the FM spot would be detected as well. The system outputs the minimum value

(a) (b)

Figure 4. A large obstacle (pillar) in a building. (a) Color image; (b) depth data.

detectable ranges but also nearer than the lower limitation.

of their relation as follows (see Figure 5):

114 Causes and Coping with Visual Impairment and Blindness

system determines the distance to be less than 40 cm.

among the mean depth values of the detected OM spots.

respectively.

Artificial environments are generally composed of many planes, such as floors and walls, and therefore, planes can be effective clues to recognize the environments. Figure 6(a) and (b) shows an example scene and its depth data, respectively. Planes are recognized [70] by using the following method based on random sample consensus (RANSAC) algorithm [71]:


5. Steps from (1) to (4) are iterated until the number of the remaining edges is less than a threshold.

2.2.2. Seats

from the floor.

Figure 8. Recognition result of a seat.

The sitting surfaces of seats (such as chairs, stools, and benches) are considered to be the most essential parts of the seats and are recognized as regions that satisfy the following conditions: 1. Candidate regions are composed of the remaining edges which are between 30 and 50 cm

Figure 8 shows the recognition result of the seat in Figure 6(a). Red pixels represent the sitting surface of the seat. Figure 9 shows the color image, depth data, and recognition result of other

(a) (b) (c)

Figure 9. Recognition result of other seats. (a) Color image; (b) depth data; (c) recognition result.

.

Assistive Systems for the Visually Impaired Based on Image Processing

http://dx.doi.org/10.5772/intechopen.70679

117

2. The areas of the candidate regions are more than 1200 cm <sup>2</sup>

seats. All the seats were recognized correctly.

In Figure 7, the floor and the two walls were recognized correctly. Black pixels are the remaining edges, which are used for seat recognition described below.

Figure 6. Processes of recognition of planes (a)-(h).

Figure 7. Recognition result of planes.

#### 2.2.2. Seats

5. Steps from (1) to (4) are iterated until the number of the remaining edges is less than a

In Figure 7, the floor and the two walls were recognized correctly. Black pixels are the

(a) (b) (c) (d)

(e) (f) (g) (h)

remaining edges, which are used for seat recognition described below.

threshold.

116 Causes and Coping with Visual Impairment and Blindness

Figure 6. Processes of recognition of planes (a)-(h).

Figure 7. Recognition result of planes.

The sitting surfaces of seats (such as chairs, stools, and benches) are considered to be the most essential parts of the seats and are recognized as regions that satisfy the following conditions:


Figure 8 shows the recognition result of the seat in Figure 6(a). Red pixels represent the sitting surface of the seat. Figure 9 shows the color image, depth data, and recognition result of other seats. All the seats were recognized correctly.

Figure 8. Recognition result of a seat.

Figure 9. Recognition result of other seats. (a) Color image; (b) depth data; (c) recognition result.

The seat recognition method was applied to 62 sample scenes including seats, and 88% of the scenes were recognized correctly.

It is difficult for this method to recognize seats that have nonparallel sitting surfaces, but there would be not so many such seats in general environment.

#### 2.2.3. Other objects

The Kinect cane system can also recognize upward staircases, downward staircases, and elevators on the basis of the recognition results of planes. The recognition methods are described in detail, for example, in [68, 70], and, in this section, the recognition results are shown in Figures 10–12.

It is difficult to recognize upward staircases composed of only one or two steps, slopes, and elevators with sealed doors. The Kinect cane system is not designed to detect holes. A user can detect holes by using the system as a conventional white cane.

#### 2.3. User interaction

Ordinarily, a visually impaired user can use the Kinect cane system as a conventional white cane. Figure 13(a) shows an example situation where a user walks in an elevator hall. The user has been here several times, and therefore the user knows there is a bench in the hall, but does not know (or forgot) its accurate location. The user stops walking for safety and then instructs the system to find the bench (seat) as shown in Figure 13(b). The user executes the seat

recognition program by pushing the corresponding key on the numeric keypad. The user makes the system search for the bench (Figure 13(c)). If the sensor finds the bench, the tactile device returns vibration feedback to the user (Figure 13(d)). The user walks toward the bench (Figure 13(e)), and then confirms it (Figure 13(f)). Finally, the user can sit on the bench

(a) (b)

(a) (b)

Assistive Systems for the Visually Impaired Based on Image Processing

http://dx.doi.org/10.5772/intechopen.70679

119

Figure 11. Recognition result of downward staircases. (a) Color image; (b) recognition result.

Figure 12. Recognition result of an elevator. (a) Color image; (b) recognition result.

(Figure 13(g)).

Figure 10. Recognition result of upward staircases. (a) Color image; (b) recognition result.

Assistive Systems for the Visually Impaired Based on Image Processing http://dx.doi.org/10.5772/intechopen.70679 119

Figure 11. Recognition result of downward staircases. (a) Color image; (b) recognition result.

The seat recognition method was applied to 62 sample scenes including seats, and 88% of the

It is difficult for this method to recognize seats that have nonparallel sitting surfaces, but there

The Kinect cane system can also recognize upward staircases, downward staircases, and elevators on the basis of the recognition results of planes. The recognition methods are described in detail, for example, in [68, 70], and, in this section, the recognition results are

It is difficult to recognize upward staircases composed of only one or two steps, slopes, and elevators with sealed doors. The Kinect cane system is not designed to detect holes. A user can

Ordinarily, a visually impaired user can use the Kinect cane system as a conventional white cane. Figure 13(a) shows an example situation where a user walks in an elevator hall. The user has been here several times, and therefore the user knows there is a bench in the hall, but does not know (or forgot) its accurate location. The user stops walking for safety and then instructs the system to find the bench (seat) as shown in Figure 13(b). The user executes the seat

(a) (b)

Figure 10. Recognition result of upward staircases. (a) Color image; (b) recognition result.

scenes were recognized correctly.

118 Causes and Coping with Visual Impairment and Blindness

2.2.3. Other objects

shown in Figures 10–12.

2.3. User interaction

would be not so many such seats in general environment.

detect holes by using the system as a conventional white cane.

Figure 12. Recognition result of an elevator. (a) Color image; (b) recognition result.

recognition program by pushing the corresponding key on the numeric keypad. The user makes the system search for the bench (Figure 13(c)). If the sensor finds the bench, the tactile device returns vibration feedback to the user (Figure 13(d)). The user walks toward the bench (Figure 13(e)), and then confirms it (Figure 13(f)). Finally, the user can sit on the bench (Figure 13(g)).

3. Kinect goggle system

Figure 14. Our Kinect goggle system.

hands free.

This section introduces another type of Kinect-based wearable assistive system, a Kinect goggle system [72] (Figure 14). A Kinect sensor is attached to a goggle on the face of a visually impaired user. A notebook computer, a numeric keypad, a tactile device, and a UPS battery are set in a shoulder bag. These devices are connected with wires for device controlling and power supply. The Kinect goggle system does not require a visually impaired user to hold a heavy Kinect sensor, and therefore the user can make his or her

Assistive Systems for the Visually Impaired Based on Image Processing

http://dx.doi.org/10.5772/intechopen.70679

121

The current system can detect obstacles and recognize seats by use of the software which is almost the same as those of the Kinect cane system. Figure 15 shows an example scene including a bench. The red region in Figure 15(c) represents the sitting surface of the bench.

(a) (b) (c)

Figure 15. Recognition result of a seat by the Kinect goggle system. (a) A bench; (b) depth data; (c) recognition result.

(g)

Figure 13. A visually impaired user wants to sit on a bench to take a rest in an elevator hall. (a) A visually impaired user comes out of an elevator. (b) The user instructs the system to find a bench. (c) The user pans the Kinect sensor. (d) The system finds the bench. (e) The user walks toward the bench. (f) The user confirms the bench. (g) The user can sit on the bench.
