**7. Discussion**

**Figure 10.** CODA algorithm cube test data obtained. Left, green antibodies created on red the unique example provid‐ ed. Right, red the unique example, in green the repertoire of the most rewarded antibodies thus the most capable of

**Figure 11.** Hand correctly grasping objects from the data obtained from the CODA algorithm.

producing grasping postures.

274 Recent Advances in Robotic Systems

CODA algorithm has been described and all the theories that involve this new design have been presented. Also the pseudocode and the flowchart have been exposed in order to explain how this algorithm will first learn from a demonstration, clone and mutate this data in order to create new knowledge (data never ever introduced to the algorithm before, created by the clonal/mutation process over the input data), evaluate its possible success within the system and finally use this new information as a goal.

The algorithm uses NIS as a basis in order to emulate several tools. The NIS is one of the most important and reliable systems within the human body. The developed algorithm uses a machine learning pipeline in order to create an algorithm, since certainly there is no one "doit-all" method in machine learning, but several tools that need to be used together in order to exploit the different characteristics of each method.

CODA algorithm needs to be evaluated first in a simple environment with a simple task such as a 2R robot, and then with the main applicationexplained in the previous section. The hand grasping test would let usoptimise the algorithm and consider which methods fit better for measuring, affinity, stimulation and if the cloning and mutation processes are suitable for the tasks.

Affinity and stimulation techniques presented would be used in order to evaluate if for the presented algorithm they are suitable or not. If not, they should be modified or even design new ones that would let the algorithm work with much more personalised functions, increas‐ ing the algorithm's reliability and positive results.

The reward function is under experimentation. It involves two aspects: the forces and the angles. The algorithm should be given a reward as soon as the hand starts closing that means when angles are bigger and the force starts to increase in the finger tips. But if the hand just closes, there is the possibility that the grasp will not be affordable since the object could be damaged. Therefore the function should give high reward when the force starts increasing but reduce the reward if the force is beyond a certain threshold. Something similar happens with the angles since they may not be bigger and completely close the hand. This procedure needs to be done experimentally in order to obtain values from the sensors and set the thresholds for the angles and forces.

The three methods that conform CODA algorithm were chosen to be part of CODA in order to increase their efficacy at solving problems rather than using them individually, in order to exploit their advantages. They complement each other and are used for a specific objective for the proposed learning process, each one specialises on the one task it can perform better.

The very first contact with information is provided by demonstrations, which is a crucial aspect for the algorithm. Providing these examples simplifies the learning of the task, if no examples were given, an extremely well-designed algorithm was to be designed and programmed in the agent in order for it to reproduce the desired task, and the algorithm would need to be really descriptive providing every possible detail in order to obtain correct results. This procedure would require specially trained personnel and it would be time consuming. Compared to the demonstration process explained in this document that requires the sensors on the robot or in the person and almost anyone would be able to teach numerous skills just by reproducing them, without specialised studies or any coding knowledge.

In the previous paragraph it was said that if programming with a complete, well-designed and correct algorithm, an entity (in this case a robot) could reproduce a task. This is completely true if conditions and the environment remain constant. The truth is that for dynamic envi‐ ronments and uncertainty situations this would not be the case. Today's robots are being more and more embedded in our society. In these dynamic environments, sensor noise is more frequent. For these reasons a robot programmed as mentioned before would not be able to commit to its objective easily and would probably fail. This is due to a lack of learning capabilities, which means, robot would not be able to learn from past mistakes and acquire new abilities or perform the same task at different scenarios. This is where reinforcement learning used by CODA becomes an important method.

As explained in previous sections, reinforcement learning would produce a change in an entity's behaviour in order to obtain the highest reward, therefore since the reward function is designed upon the task's most crucial and important aspects, the algorithm will always search for the states that will produce the highest reward, if not, actions would be needed in order to achieve the task. These actions have the advantage that they can be low-level (i.e., voltages applied to motors) or high-level decisions as complex concepts or complex decisions (i.e. don't move).

In the CODA Algorithm the reinforcement learning helps the Artificial Immune System (AIS) to produce the best antibodies possible given a task-specific reward function, meaning the new antibodies would produce higher rewards. If bad antibodies are being produced, actions would alter the clonal/mutation procedure in order to modify the diversity generation.

Finally, the third method is the AIS which produces new data from single example – from the pathogen's antigen reducing the necessity of big data set, that may result expensive, difficult to obtain and construct thus time consuming. It also handles all the information and evaluates the produced data in order to find and produce the best possible antibodies. Finally it con‐ structs a self-repertoire of state-action arrays in order to produce a faster response the next time the agent is encountered in the same state.

As an example of the functionality of CODA the system was tested, the test methodology is as follows: First wearing the data-glove a person grasped one of the geometric shapes in **Figure 12**. For this specific example the grasp was done with the cube. Once grasped, the sensor values were stored.

With these stored values, CODA produces 100 new data samples (antibodies). They are evaluated in order to select the most rewarded. Once filtered, the data set was reduced to maintain the most capable antibodies to reproduce the grasping task, then these antibodies were used to reproduce the task.

procedure would require specially trained personnel and it would be time consuming. Compared to the demonstration process explained in this document that requires the sensors on the robot or in the person and almost anyone would be able to teach numerous skills just

In the previous paragraph it was said that if programming with a complete, well-designed and correct algorithm, an entity (in this case a robot) could reproduce a task. This is completely true if conditions and the environment remain constant. The truth is that for dynamic envi‐ ronments and uncertainty situations this would not be the case. Today's robots are being more and more embedded in our society. In these dynamic environments, sensor noise is more frequent. For these reasons a robot programmed as mentioned before would not be able to commit to its objective easily and would probably fail. This is due to a lack of learning capabilities, which means, robot would not be able to learn from past mistakes and acquire new abilities or perform the same task at different scenarios. This is where reinforcement

As explained in previous sections, reinforcement learning would produce a change in an entity's behaviour in order to obtain the highest reward, therefore since the reward function is designed upon the task's most crucial and important aspects, the algorithm will always search for the states that will produce the highest reward, if not, actions would be needed in order to achieve the task. These actions have the advantage that they can be low-level (i.e., voltages applied to motors) or high-level decisions as complex concepts or complex decisions

In the CODA Algorithm the reinforcement learning helps the Artificial Immune System (AIS) to produce the best antibodies possible given a task-specific reward function, meaning the new antibodies would produce higher rewards. If bad antibodies are being produced, actions would alter the clonal/mutation procedure in order to modify the diversity generation.

Finally, the third method is the AIS which produces new data from single example – from the pathogen's antigen reducing the necessity of big data set, that may result expensive, difficult to obtain and construct thus time consuming. It also handles all the information and evaluates the produced data in order to find and produce the best possible antibodies. Finally it con‐ structs a self-repertoire of state-action arrays in order to produce a faster response the next

As an example of the functionality of CODA the system was tested, the test methodology is as follows: First wearing the data-glove a person grasped one of the geometric shapes in **Figure 12**. For this specific example the grasp was done with the cube. Once grasped, the sensor

With these stored values, CODA produces 100 new data samples (antibodies). They are evaluated in order to select the most rewarded. Once filtered, the data set was reduced to maintain the most capable antibodies to reproduce the grasping task, then these antibodies

by reproducing them, without specialised studies or any coding knowledge.

learning used by CODA becomes an important method.

time the agent is encountered in the same state.

(i.e. don't move).

276 Recent Advances in Robotic Systems

values were stored.

were used to reproduce the task.

**Figure 12.** Test result from grasping cube, top left figures show the data-glove system used to acquire grasping data. Top right graph shows the single example acquired and provided to CODA. Bottom left show the example and the antibodies produced by CODA, finally bottom right show a close up of the thumb antibodies and sample distribution.

**Figure 13.** Geometric figures for testing the system, and the cube used for this very first test.

It is important to notice that the only example provided was the one obtained with the dataglove. No information was given to the algorithm about the hand size or any parametric information. It only searched for the best possibility in the antibody population based on the reward function (**Figure 13**).

The CODA algorithm has proved to be able to use one simple example and produce new and useful data from this unique information and from there produce a grasping posture. The requirements are less than some other grasping algorithms that can correctly produce grasping postures as well but need much more information such as 3D objects [52], orientation of the hand and tracking with tags [45], tons of pre-processing hand pose (examples) needed for the algorithm to learn with expensive computational procedures for reducing dimensions [46, 47] or partial object geometry information [48]. These approaches are way more expensive not only computationally but also in the hardware required and rely on the use of big databases and more requirements than CODA does.
