**6. Application**

**Figure 6** shows how this new data has a completely different probability density, providing new opportunities for unseen data. The examples were obtained from the data glove as a

The algorithm was given a five-line two-column array that contains force and angle values for each of the five fingers. With this example the algorithm was able to produce completely new

**Figure 7.** Red squares are antigens that were cloned and mutated producing the blue circles (antibodies) with the CO‐

Once clonal and mutation procedure is completed the algorithm will take the antibodies information and evaluate their affinity to the original value and with the reward function, this will produce an expected reward before actually using this configuration in the hardware. If values of affinity and reward are both below a threshold, the antibodies are considered to be

After the previous procedure is completed, the reinforcement learning algorithm uses the antibodies and defines the most rewarded as the goal state in order to complete the task. Finally from the *M* taken antibodies the algorithm will lead the system to the goal state. It is important to store the antibody-reward pair that corresponds to the one that obtain the highest value by the reward function doing the action, since this was the one that complete the task as desired. Finally a matrix should be constructed in order to store the Q matrix and the antibody-reward pair, for future reference in the memory. The last step will run the clustering in order to segregate the different classes that can be found and traduced as different tasks for the system.

DA clonal/mutation procedure. Mutated under a normal distribution set by the user.

maintained in the memory.

270 Recent Advances in Robotic Systems

demonstration example from the first stage of the block diagram shown in **Figure 8**.

data that before its use should be evaluated (Eq. (4)) in order to be used.

The CODA algorithm in our laboratory will be implemented in a humanoid hand which is part of the Inmoov open source project. The application is one of the most difficult tasks in robotics – grasping objects. The system components are as follows and it is shown in the picture below.

The humanoid hand, all 3D printed in the laboratory consist of six standard servomotors, flex sensors for position information and force-sensing resistors for obtaining information about grasping forces.

The main brain for the entire system is a Raspberry pi where the algorithm will be saved and implemented. The Raspberry pi is connected with an Arduino which acquires the data from the sensors and the servomotors in order to send it to the Raspberry pi for processing.

**Figure 3** will help the reader understand how the algorithm will be used in the application. The main idea is to produce an affordable grasping for an object. To exploit the advantages of CODA, the training examples will be limited, and CODA should produce more data as the one showed in **Figure 6**.

The first stage will demonstrate how to grasp the object (training example or antigen), where the human will use the data glove in order to grasp the object and record this training data. Once the training data is obtained, CODA will then define it as an antigen, since it is emulating the NIS, the information from the antigen will be taken for the cloning procedure.

The clonal procedure, affinity measuring, reward measuring and reinforcement learning will all take place in the Raspberry pi, which is the dedicated hardware for processing all data and store the knowledge or repertoire of antibodies.

The previous mentioned steps shown in **Figure 3** and in Algorithm 1 will then produce an affordable grasp that can be used by the hand in order to grasp the same object grasped by the human hand previously.

In this manner CODA will let the hand grasp objects with minimum data examples and with no knowledge of the physical characteristics of the human or robot hand. Instead it will use a reward function specifically designed for the application that will give numerical incentives in order to reach certain finger angles and force values.


**Table 4.** Single example given to the CODA algorithm, showing a simple mutated pair of angle and force values for each finger.

**Table 4** shows a simple example of the original pair obtained from the data glove including angle and force. It can be seen how the clonal/mutation procedure in CODA modifies the values producing new data. Within the same table in columns three and four contain a simple set of cloned values that are also shown in **Figure 7**. And finally, the last column exposes the measured affinity computed with Eq. (4). Now the reward should be measured before the antibodies can be stored and used by the SARSA algorithm as goal states.

The Discussion section talked about the reward function that needs to be designed taking in consideration several factors in the system in order to make it possible to measure how suitable are the final antibodies for reproducing the task.

From this point on, the algorithm is straightforward. SARSA is used in order to reproduce the grasping posture which was set as a goal by the antibodies that meet the criteria. And finally, the Q-Matrix will be stored with all the related values such as antigen, antibodies, reward and affinity.

This application was selected because it happens to have low amounts of examples available about the task and a really difficult task that most of the time needs the use of kinematics and dynamics in order to produce grasping postures. Thus grasping is a great application area, since it is desired to create new knowledge about the grasping postures and remember those executed grasping in order to have faster responses for further interactions with the same object just as the AIS does with the secondary response with an antigen that has already been in the body.

The more objects the hand grasp, the more knowledge it will acquire. To simulate the selforganised memory, a clustering algorithm should run through the knowledge database and cluster similar postures, suggesting which of those postures correspond to similar shaped objects but with different dimensions.

In this manner CODA will let the hand grasp objects with minimum data examples and with no knowledge of the physical characteristics of the human or robot hand. Instead it will use a reward function specifically designed for the application that will give numerical incentives

**Table 4.** Single example given to the CODA algorithm, showing a simple mutated pair of angle and force values for

**Table 4** shows a simple example of the original pair obtained from the data glove including angle and force. It can be seen how the clonal/mutation procedure in CODA modifies the values producing new data. Within the same table in columns three and four contain a simple set of cloned values that are also shown in **Figure 7**. And finally, the last column exposes the measured affinity computed with Eq. (4). Now the reward should be measured before the

The Discussion section talked about the reward function that needs to be designed taking in consideration several factors in the system in order to make it possible to measure how suitable

From this point on, the algorithm is straightforward. SARSA is used in order to reproduce the grasping posture which was set as a goal by the antibodies that meet the criteria. And finally, the Q-Matrix will be stored with all the related values such as antigen, antibodies, reward and

This application was selected because it happens to have low amounts of examples available about the task and a really difficult task that most of the time needs the use of kinematics and dynamics in order to produce grasping postures. Thus grasping is a great application area, since it is desired to create new knowledge about the grasping postures and remember those executed grasping in order to have faster responses for further interactions with the same object just as the AIS does with the secondary response with an antigen that has already been in the

The more objects the hand grasp, the more knowledge it will acquire. To simulate the selforganised memory, a clustering algorithm should run through the knowledge database and cluster similar postures, suggesting which of those postures correspond to similar shaped

**Angle Force Mutated angle Mutated force Affinity** 500 82.6499 498.8566 2.886058483 534 100.1784 534.696 0.7185500216 480 74.0844 481.2094 1.624368099 767 60.427 767.4406 0.613561211 680 100.8786 680.0866 0.882857588

antibodies can be stored and used by the SARSA algorithm as goal states.

are the final antibodies for reproducing the task.

objects but with different dimensions.

in order to reach certain finger angles and force values.

each finger.

272 Recent Advances in Robotic Systems

affinity.

body.

**Figure 9.** Grasping a cube with data-glove, simple example of grasping posture obtained is on the graph below.

To demonstrate the functionality of the algorithm, a simple test was done. From a set of geometric figures, one was chosen for the test. The test protocol is simple: first the object is grasped by any user wearing the the data-glove system, once the sensor reading remain stable during grasping the data is saved to MATLAB workspace in order to be handle by CODA as antigen. As explained during previous chapters the algorithm will generate diversity from this given simple example (limited data) and produce certain amount of antigens for later evalu‐ ation by the emulated clonal selection and negative selection procedures embedded in the algorithm using the reward and affinity functions. For this specific test the cube figure was selected, and a simple example was recorded. **Figure 9** shows the cube grasped by the human teacher wearing the data-glove and the data obtained.

Finally the obtained information shown in **Figure 9** is feed into CODA. From this, a set of 500 antibodies was created that were evaluated by the reinforcement learning reward function and filtered in order to leave the most rewarded antibodies in the repertoire for the hand to grasp the object. **Figure 10** (left) shows the total antibodies created after the learning from demon‐ stration procedure and on the right, the remaining repertoire data result of the reward and affinity evaluation procedures. This repertoire is considered the most capable data to produce affordable grasping postures, therefore, all of them were delivered to the hand system to confirm the functionality. Resulting to 100% of positive results means all the repertoire pairs were able to grasp the object.

With this simple test, the hand, data-glove and CODA algorithm have proven to function correctly and to produce useful data from limited samples (one simple example). They were able to produce new information that after being filtered and evaluated could reproduce the task without any parametric information regarding the hand size or link dimensions. The algorithm prove to be able to grasp the cube 100% of the time but when tested with an hexagonal prism it worked with 83% of the cases only, but still is a good result. **Figure 11** shows the hand grasping the cube and the hexagonal prism during the tests under the same protocol.

**Figure 10.** CODA algorithm cube test data obtained. Left, green antibodies created on red the unique example provid‐ ed. Right, red the unique example, in green the repertoire of the most rewarded antibodies thus the most capable of producing grasping postures.

**Figure 11.** Hand correctly grasping objects from the data obtained from the CODA algorithm.
