**2.2. Hard programming vs learning**

A simple but effective definition of learning from demonstration has been presented previ‐ ously. In this section, a more formal and structured definition is presented. Learning from demonstration is a mapping built by examples between environment states and actions to perform. This is called policy, and it gives the robot the skill of selecting which action to select if found in a certain environment configuration. The policy is built by all the demonstrations the robot obtains.

There are two paths in learning from demonstration. The first one is very simple. It basically let the robot mimic the motion of the instructor; in other words, the computer must learn how the instructor acts, reacts and handle errors in different situations, which is a process called learning the policy.

The second path, and is more important in this work, is where the robot may not be doing exactly the same task as the demonstrator and there is a small number of task demonstrations available. This means the robot should learn under uncertainty and with low loads of data. In these cases, algorithms should extract the most out of the information in order to be able to process all the information and then create correct cognitive responses topic to be studied in the CODA section.

### *2.2.1. Pendant programming and kinetic teaching*

Pendant programming and kinetic teaching are techniques that could be considered the simplest form of learning from demonstration. Both techniques have their advantages and disadvantages, but they are outside the aim of this book. It is of most importance to mention how they work.

**Figure 1** shows an example of pendant programming where the human uses a programming pendant in order to move the robot thru desired points, this technique can produce two paths, a polynomic or linear path, but learning procedure never takes place during this action, the robot will always move on the chosen path, this limits the generalization capacity of the robot. This is why this technique belongs to a hard programming techniques to control a robot even it uses demonstration to hel the user program the robot.

**Figure 1.** Pendant programming example, same trajectory with two interpolation options.

### *2.2.2. Learning from data*

desired results every time the action is performed regardless of the variability of the environ‐

Once a robot is able to reproduce certain skills with the least error possible or even better, nonerror, it can be said that the robot has learnt a skill correctly. Therefore, it can be said that it

Robots can be controlled by different methods and techniques that have been used for several years and produce excellent results in certain conditions and applications [16, 17]. But it is important to note that robots are also being used outside of factories in applications and environments that require high adaptability, reliability and constant learning. These may demand robots that are capable of handling uncertainty and variability in fast and dynamic

It can be assured now that learning is a valuable and almost necessary skill for robots, almost since Alan Turing wrote *Computing Machinery and Intelligence,* concluding that "*We can only see*

A simple but effective definition of learning from demonstration has been presented previ‐ ously. In this section, a more formal and structured definition is presented. Learning from demonstration is a mapping built by examples between environment states and actions to perform. This is called policy, and it gives the robot the skill of selecting which action to select if found in a certain environment configuration. The policy is built by all the demonstrations

There are two paths in learning from demonstration. The first one is very simple. It basically let the robot mimic the motion of the instructor; in other words, the computer must learn how the instructor acts, reacts and handle errors in different situations, which is a process called

The second path, and is more important in this work, is where the robot may not be doing exactly the same task as the demonstrator and there is a small number of task demonstrations available. This means the robot should learn under uncertainty and with low loads of data. In these cases, algorithms should extract the most out of the information in order to be able to process all the information and then create correct cognitive responses topic to be studied in

Pendant programming and kinetic teaching are techniques that could be considered the simplest form of learning from demonstration. Both techniques have their advantages and disadvantages, but they are outside the aim of this book. It is of most importance to mention

**Figure 1** shows an example of pendant programming where the human uses a programming pendant in order to move the robot thru desired points, this technique can produce two paths,

*a short distance ahead, but we can see plenty there that needs to be done*" [18].

ment at each try.

256 Recent Advances in Robotic Systems

environments.

the robot obtains.

learning the policy.

the CODA section.

how they work.

*2.2.1. Pendant programming and kinetic teaching*

**2.2. Hard programming vs learning**

has certain intelligence embedded that let it learn.

The previous section exemplified how a robot can be programmed by a demonstration and still lacks intelligence. Intelligence is formed by several different bits such as reasoning and logical deduction but for our purpose, the most basic animal intelligence characteristics that matter are remembering, adapting and generalising.

Remembering, adapting and generalising are part of learning, and these characteristics are the most important to embed in machines. Machines should be able to learn with examples, data and experience. First the machine should obtain an example which contains data, and at each try, the machine should acquire experience. Using the concepts presented earlier, the machine should be able to recognise when was the last time it was in this situation (encountered the same data), tried a particular action (produce this output) and it worked (the output action(s) was/were correct), so it can be tried again. But if it does not work properly, then something different should be tried.

Learning is a skill that gives us the ability to be flexible in our day to day activities. We adapt and adjust our self to new circumstances, recognise similarity between different situations in order to use knowledge in one place or another, and therefore use acquired knowledge in different places and situations.

Learning by demonstration is a powerful tool that has already shown its competence in the robotics field. However, it may be impossible to complete any desired task if it will be the only tool to be used in the entire system. To address this, a hybrid system was built where it will use a non-model based algorithm such as Q-learning or SARSA (explained in the next section) where the policy will be represented by a Q matrix and adjusted directly according to the reward obtained during each try. These consequences of the adjustment in the policy are measured by the reward function to guide future adjustments in the policy.

The reward function does not actually tell the algorithm if the output is correct or not; instead it tells how much correct it is, and the stored values of the Q matrix should serve as a repertoire of knowledge in order to recognise certain "*encountered data–output action*" pairs for future reference.
