**5. CODA**

Cognition from Data is what names the proposed algorithm, as its name can tell, the objective of the algorithm is to aim for cognition in a computational manner – analysing data, extracting more information if possible and acting in an environment to achieve a task with the most accurate action even with few data as training examples. This is done by the NIS when a "naive individual" for the first time recognises an antigen. In order to comply to these characteristics and objectives, an algorithm should be flexible yet specific, should learn from previous experiences and store the knowledge in an organised manner. In doing so, it would respond faster the next time the situation occurs, just as the NIS in the secondary immune response responds to a previous known antigen.

In a search for a system that could have these characteristics, the NIS is one of the most impressive natural systems that not only adapts itself to new and unknown situations. It also learns from every experience and creates its own knowledge by an awesome mechanism where antibodies are created and stored in order to create a fastest response if a second interaction with the antigen occurs. **Table 1** in the previous section talked about certain desirable char‐ acteristics that CODA emulates from its biological counterpart – the NIS. This mechanism lets the NIS keep safe our body and it has been doing a great job for millions of years. Therefore CODA should provide a great solution to the grasping problem shown in Section 6, using AIS exceptional features.

The main reason why the NIS was studied in [20] is to have a more detailed panorama from this system that was taken as spinal vertebrae to design an algorithm. The non-free lunch theorem presented in [6] was also taken into consideration since it mentions that there is not one tool that could do all the possible tasks available in the problem space. It is important to create machine learning pipelines that could handle a complex problem and solve it with different tools not just one that would try to be the "do it all".

CODA Algorithm: An Immune Algorithm for Reinforcement Learning Tasks http://dx.doi.org/10.5772/63570 267

Again the Euclidean distance is used as a measure for affinity, which suggests it could be a standard for this process. But how the BCA can be differentiated from one of the most popular evolutionary algorithm such as genetic algorithms? The reader should notice that no cross over is employed in the cloning process, there is no necessity for this method to be applied in the

Specifically talking about AIS algorithms and more precisely about clonal selection ones, the authors consider it really important to make the population variation according to probabilistic rules and follow the nature of the model so this probabilities for the transitions to a new state depend only on the current state so the Markov chain could be satisfied. This also means that the algorithm should converge and find at least one global optimum solution with probability equal to one as *t* →*∞*. One of the very first papers that introduced this was [42] and should be

Cognition from Data is what names the proposed algorithm, as its name can tell, the objective of the algorithm is to aim for cognition in a computational manner – analysing data, extracting more information if possible and acting in an environment to achieve a task with the most accurate action even with few data as training examples. This is done by the NIS when a "naive individual" for the first time recognises an antigen. In order to comply to these characteristics and objectives, an algorithm should be flexible yet specific, should learn from previous experiences and store the knowledge in an organised manner. In doing so, it would respond faster the next time the situation occurs, just as the NIS in the secondary immune response

In a search for a system that could have these characteristics, the NIS is one of the most impressive natural systems that not only adapts itself to new and unknown situations. It also learns from every experience and creates its own knowledge by an awesome mechanism where antibodies are created and stored in order to create a fastest response if a second interaction with the antigen occurs. **Table 1** in the previous section talked about certain desirable char‐ acteristics that CODA emulates from its biological counterpart – the NIS. This mechanism lets the NIS keep safe our body and it has been doing a great job for millions of years. Therefore CODA should provide a great solution to the grasping problem shown in Section 6, using AIS

The main reason why the NIS was studied in [20] is to have a more detailed panorama from this system that was taken as spinal vertebrae to design an algorithm. The non-free lunch theorem presented in [6] was also taken into consideration since it mentions that there is not one tool that could do all the possible tasks available in the problem space. It is important to create machine learning pipelines that could handle a complex problem and solve it with

different tools not just one that would try to be the "do it all".

BCA in order to increase diversity.

266 Recent Advances in Robotic Systems

revised for a detailed lecture on the theme.

responds to a previous known antigen.

exceptional features.

**5. CODA**

**Figure 4.** Block diagram showing the cognition model proposed by the authors with the CODA algorithm.

Algorithm 1 in **Figure 9** (Appendix) shows the pseudocode for the CODA algorithm, where it can be seen it has three elements that were presented previously on this document, the Reinforcement learner and the AIS. The learning from demonstration is an external part of the algorithm that acquires the training information and then presents this data to the CODA algorithm. **Figure 4** shows a flow chart that presents in a simpler way the pseudo code steps that may take place and adds the part of the algorithm that takes place outside of the learning system which is the demonstration of the skill.

The sequence of steps shown in **Figure 2** are of great importance since any change in the order will modify how the entity will learn and measure its performance according to the desired state. Also it is important to notice that at the beginning, there is a human–machine interaction that would recover the data from the example, and that it is necessary to be careful on which platform is chosen for this interaction in order to be positive and recover the most of the data possible without causing any discomfort in humans or any stress. The most positive the interaction the easier will be for the humans to share data and for the algorithm to have more and better examples of the task to be done.

In our case a data glove is the hardware chosen in order to acquire data (antigen). The glove is a well-known tool and almost any human has wear one, in this manner the acceptance will be faster, easier and will lead to a better interaction. This is really important because as mentioned before, there will be scenarios where data could be hard to obtain and may be corrupted with noise without mentioning that it may also be limited.

**Figure 5.** Probability densities from 11 neurons showing their preferred firing rates to certain stimulus, notice the true value stimulus is at *s* = 0.

The flow chart is specific for the presented application but it lets the reader follow the algorithm forward on its run. On the first stage, the knowledge should be given in certain manner. It could be demonstrated (in our case with the data glove) or it could be set with some previous recorded data. CODA will obtain this antigen from the data glove and produce clones from this single example. A simulation result can be found in **Figure 7**, where red squares are the training example and the blue circles are the cloned antibodies produced by CODA's clonal/ mutation procedure. Then the entity should measure its state in order to define this information as the initial state, of course a task to be performed by the sensors embedded in the hand as explained in **Figure 8** in Section 6. After the cloning procedure has created the antibodies, the affinity should be measured as explained in Eq. (4). If the affinity criteria are not met the antibodies will be discarded and deleted.

Algorithm 1 in **Figure 9** (Appendix) shows the pseudocode for the CODA algorithm, where it can be seen it has three elements that were presented previously on this document, the Reinforcement learner and the AIS. The learning from demonstration is an external part of the algorithm that acquires the training information and then presents this data to the CODA algorithm. **Figure 4** shows a flow chart that presents in a simpler way the pseudo code steps that may take place and adds the part of the algorithm that takes place outside of the learning

The sequence of steps shown in **Figure 2** are of great importance since any change in the order will modify how the entity will learn and measure its performance according to the desired state. Also it is important to notice that at the beginning, there is a human–machine interaction that would recover the data from the example, and that it is necessary to be careful on which platform is chosen for this interaction in order to be positive and recover the most of the data possible without causing any discomfort in humans or any stress. The most positive the interaction the easier will be for the humans to share data and for the algorithm to have more

In our case a data glove is the hardware chosen in order to acquire data (antigen). The glove is a well-known tool and almost any human has wear one, in this manner the acceptance will be faster, easier and will lead to a better interaction. This is really important because as mentioned before, there will be scenarios where data could be hard to obtain and may be

**Figure 5.** Probability densities from 11 neurons showing their preferred firing rates to certain stimulus, notice the true

The flow chart is specific for the presented application but it lets the reader follow the algorithm forward on its run. On the first stage, the knowledge should be given in certain manner. It could be demonstrated (in our case with the data glove) or it could be set with some previous recorded data. CODA will obtain this antigen from the data glove and produce clones from this single example. A simulation result can be found in **Figure 7**, where red squares are the training example and the blue circles are the cloned antibodies produced by CODA's clonal/ mutation procedure. Then the entity should measure its state in order to define this information

corrupted with noise without mentioning that it may also be limited.

system which is the demonstration of the skill.

268 Recent Advances in Robotic Systems

and better examples of the task to be done.

value stimulus is at *s* = 0.

The clonal procedure was inspired from the response of certain neurons given a set of stimulus. The outputs and the probability densities are shown in **Figure 5**. The graph represents the conditional probability presented by certain neurons producing certain firing rates *r* given a specific stimulus *s* [43]. This information is presented in [44] where several experiments are presented using the cercal system of the cricket in order to study neural decoding.

Using a clonal/mutation procedure the algorithm was able to obtain completely different probability distributions maintaining a normal density, this can be shown in **Figure 6** where the original values are compared with the mutated values after the clonal/mutation procedure.

It can be seen that we obtained different values that give much more data where the algorithm can explore and evaluate with the next stages of CODA Algorithm in order to use them if they are suitable for solving the problem.

**Figure 6.** Probability densities comparison from original data and cloned/mutated data using CODA clonal/mutation procedure.

In fact, the clonal/mutation procedure will be given limited data, and it must produce new data samples even if it is given a simple example. **Figure 7** shows precisely that the clonal/ mutation procedure is capable of producing completely new data from a simple example and **Figure 6** shows how this new data has a completely different probability density, providing new opportunities for unseen data. The examples were obtained from the data glove as a demonstration example from the first stage of the block diagram shown in **Figure 8**.

The algorithm was given a five-line two-column array that contains force and angle values for each of the five fingers. With this example the algorithm was able to produce completely new data that before its use should be evaluated (Eq. (4)) in order to be used.

**Figure 7.** Red squares are antigens that were cloned and mutated producing the blue circles (antibodies) with the CO‐ DA clonal/mutation procedure. Mutated under a normal distribution set by the user.

Once clonal and mutation procedure is completed the algorithm will take the antibodies information and evaluate their affinity to the original value and with the reward function, this will produce an expected reward before actually using this configuration in the hardware. If values of affinity and reward are both below a threshold, the antibodies are considered to be maintained in the memory.

After the previous procedure is completed, the reinforcement learning algorithm uses the antibodies and defines the most rewarded as the goal state in order to complete the task. Finally from the *M* taken antibodies the algorithm will lead the system to the goal state. It is important to store the antibody-reward pair that corresponds to the one that obtain the highest value by the reward function doing the action, since this was the one that complete the task as desired. Finally a matrix should be constructed in order to store the Q matrix and the antibody-reward pair, for future reference in the memory. The last step will run the clustering in order to segregate the different classes that can be found and traduced as different tasks for the system.

**Figure 8.** Hand system with its elements, where arrows show how data are managed between hardware components.
