*3.5.1.1 Monocular camera image acquisition node*

For monocular vision, we use a Daheng Mercury series industrial camera MER-500-7UC, which uses USB2.0 digital interface and provides free SDK and secondary

**123**

shown in **Figure 19**.

**Figure 17.**

**Figure 16.**

*Point cloud image.*

*3.5.1.2 Image processing node*

*Binocular image acquisition flow chart.*

*Manipulating Complex Robot Behavior for Autonomous and Continuous Operations*

*Five pictures generated by Bumblebee2 camera. 1. Gray image of left eye after correction. 2. Color image of left eye after correction. 3. Gray image of right eye after correction. 4. Color image of right eye after correction. 5.* 

development example source code under windows platform and Linux platform. We use the *usb\_camera* package provided by ROS to collect monocular images, as

The image processing node receives the collected planar image and point cloud image and provides different image processing function interfaces according to different business requirements. In the current task requirements, image processing nodes are required to complete the accurate two-dimensional recognition

*DOI: http://dx.doi.org/10.5772/intechopen.92254*

**Figure 15.** *Tomato identification interface.*

*Manipulating Complex Robot Behavior for Autonomous and Continuous Operations DOI: http://dx.doi.org/10.5772/intechopen.92254*

#### **Figure 16.**

*Service Robotics*

*3.5.1 Vision module node*

and 3D point clouds (**Figure 16**).

camera SDK.

as shown in **Figure 18**.

*3.5.1.1 Monocular camera image acquisition node*

the functions and can be divided into functional nodes. Nodes are the minimum functional modules in ROS, which regard as the ultimate goal of the design.

In the dual-arm acquisition robot eye-hand system, the main component of the "eye" is the camera, including a binocular camera mounted on the robot's head and a monocular camera mounted on the arm. Based on the ROS framework, we designed three nodes to complete the environment perception function: binocular camera image acquisition node (*dual\_eye\_image\_capture*), monocular camera image acquisition node (*single\_eye\_image\_capture*), and image processing node (*image\_ processing*). The actual recognition effect is shown in **Figure 15**. There are three valid tomatoes in the image. The system recognizes all tomatoes and marks the positions of the tomatoes in the picture that need to be picked first according to the rules.

The binocular camera acquisition node uses the two original images collected by the left and right sensors of the Bumblebee2 camera to finally generate five images: left and right eye corrected color images, left and right eye corrected gray images,

Based on the above process, we designed the binocular collection program UML

For monocular vision, we use a Daheng Mercury series industrial camera MER-500-7UC, which uses USB2.0 digital interface and provides free SDK and secondary

The collection process of the binocular camera is shown in **Figure 17**. The camera's original data are read, and the data are packaged into a Bell template image; then, three color information is extracted from the Bell template image and assembled into the original color image. The eye image data are used to obtain corrected left and right eye color images and grayscale images. Next, the left and right eye images are used for stereo matching through the principle of triangulation to generate a 3D point cloud. In the end, all the five images generated were published, and the algorithm used in the image acquisition process was provided by the

**122**

**Figure 15.**

*Tomato identification interface.*

*Five pictures generated by Bumblebee2 camera. 1. Gray image of left eye after correction. 2. Color image of left eye after correction. 3. Gray image of right eye after correction. 4. Color image of right eye after correction. 5. Point cloud image.*

#### **Figure 17.**

*Binocular image acquisition flow chart.*

development example source code under windows platform and Linux platform. We use the *usb\_camera* package provided by ROS to collect monocular images, as shown in **Figure 19**.

#### *3.5.1.2 Image processing node*

The image processing node receives the collected planar image and point cloud image and provides different image processing function interfaces according to different business requirements. In the current task requirements, image processing nodes are required to complete the accurate two-dimensional recognition


**Figure 18.**

*Binocular acquisition node UML design.*

**Figure 19.** *Image captured by monocular camera.*

and accurate three-dimensional positioning of tomatoes. Therefore, the twodimensional image dataset of the scene and the three-dimensional point cloud data are also required. Next, we introduce us from two directions. Image processing node design: first is the architecture design and functional flow of the image processing node as a functional interface, and the second is the specific implementation of related image processing algorithms.

The entire software system is based on the C/S model architecture, using the *actionlib* function package provided by ROS, with the task planning node as the server, and requesting computing resources from the client of each functional unit. Image processing nodes are no exception. After receiving the image processing instructions and image data, the instructions are parsed to clarify the functional requirements, and then the required image data are extracted, input into the algorithm function for processing, and the results are finally returned to the server.

The specific processing flow is shown in **Figure 20**. After initializing the node and *actionlib* server, start the service, wait for the goal sent by the client, and subscribe to the processing function. After receiving the instruction, analyze the source of the instruction. If the instruction originates from the spatial positioning of tomatoes, the processing steps are: first, use the tomato recognition algorithm based on image feature fusion to identify all tomatoes in the right eye image space of the binocular camera. If there are no tomatoes, return the results; if tomatoes

**125**

the tomato.

**Figure 21.**

**Figure 20.**

*Image processing node flow chart.*

*Image processing node UML design.*

*3.5.2 Eye-hand coordination module node*

*3.5.2.1 Eye-hand collaboration process design*

*Manipulating Complex Robot Behavior for Autonomous and Continuous Operations*

are detected, plan the picking order. The rule is from bottom to top, left to right, and calculate the spatial position of the pick point, and finally return the result to the client. If the object recognition result triggers the harvesting task, the image collected by the monocular camera is used to extract the central image feature of

According to the above process, the design program UML is shown in **Figure 21**.

Our solution uses an eye-in-hand vision servo solution to achieve eye-hand coordination, as shown in **Figure 22**. The picking robot obtains the image information of the target fruit through a monocular camera installed on the picking hand,

*DOI: http://dx.doi.org/10.5772/intechopen.92254*

*Manipulating Complex Robot Behavior for Autonomous and Continuous Operations DOI: http://dx.doi.org/10.5772/intechopen.92254*

#### **Figure 20.**

*Service Robotics*

**Figure 18.**

**Figure 19.**

*Binocular acquisition node UML design.*

**124**

and accurate three-dimensional positioning of tomatoes. Therefore, the twodimensional image dataset of the scene and the three-dimensional point cloud data are also required. Next, we introduce us from two directions. Image processing node design: first is the architecture design and functional flow of the image processing node as a functional interface, and the second is the specific implementation of

The entire software system is based on the C/S model architecture, using the *actionlib* function package provided by ROS, with the task planning node as the server, and requesting computing resources from the client of each functional unit. Image processing nodes are no exception. After receiving the image processing instructions and image data, the instructions are parsed to clarify the functional requirements, and then the required image data are extracted, input into the algorithm function for processing, and the results are finally returned to the server. The specific processing flow is shown in **Figure 20**. After initializing the node

and *actionlib* server, start the service, wait for the goal sent by the client, and subscribe to the processing function. After receiving the instruction, analyze the source of the instruction. If the instruction originates from the spatial positioning of tomatoes, the processing steps are: first, use the tomato recognition algorithm based on image feature fusion to identify all tomatoes in the right eye image space of the binocular camera. If there are no tomatoes, return the results; if tomatoes

related image processing algorithms.

*Image captured by monocular camera.*

*Image processing node flow chart.*


#### **Figure 21.**

*Image processing node UML design.*

are detected, plan the picking order. The rule is from bottom to top, left to right, and calculate the spatial position of the pick point, and finally return the result to the client. If the object recognition result triggers the harvesting task, the image collected by the monocular camera is used to extract the central image feature of the tomato.

According to the above process, the design program UML is shown in **Figure 21**.

#### *3.5.2 Eye-hand coordination module node*

#### *3.5.2.1 Eye-hand collaboration process design*

Our solution uses an eye-in-hand vision servo solution to achieve eye-hand coordination, as shown in **Figure 22**. The picking robot obtains the image information of the target fruit through a monocular camera installed on the picking hand,

extracts the position information of the tomato features in the two-dimensional image, and makes a difference from the expected position information. The difference is used as the input of the visual servo control algorithm and then calculate the control output in real time, that is, the speed vector of the end effector, and then integrate this speed vector with time to calculate the next point that needs to reach the target position. Cycle back and forth to get a trajectory that gradually approaches the target position. The eye-hand correspondence is converted into the amount of motion of the joint, and the end of the robot arm moves accordingly to approach the target. The implementation process is shown in **Figure 23.**
