*3.5.1 Vision module node*

In the dual-arm acquisition robot eye-hand system, the main component of the "eye" is the camera, including a binocular camera mounted on the robot's head and a monocular camera mounted on the arm. Based on the ROS framework, we designed three nodes to complete the environment perception function: binocular camera image acquisition node (*dual\_eye\_image\_capture*), monocular camera image acquisition node (*single\_eye\_image\_capture*), and image processing node (*image\_ processing*). The actual recognition effect is shown in **Figure 15**. There are three valid tomatoes in the image. The system recognizes all tomatoes and marks the positions of the tomatoes in the picture that need to be picked first according to the rules.

The binocular camera acquisition node uses the two original images collected by the left and right sensors of the Bumblebee2 camera to finally generate five images: left and right eye corrected color images, left and right eye corrected gray images, and 3D point clouds (**Figure 16**).

The collection process of the binocular camera is shown in **Figure 17**. The camera's original data are read, and the data are packaged into a Bell template image; then, three color information is extracted from the Bell template image and assembled into the original color image. The eye image data are used to obtain corrected left and right eye color images and grayscale images. Next, the left and right eye images are used for stereo matching through the principle of triangulation to generate a 3D point cloud. In the end, all the five images generated were published, and the algorithm used in the image acquisition process was provided by the camera SDK.

Based on the above process, we designed the binocular collection program UML as shown in **Figure 18**.
