**3.1 On-line modulations**

The output dynamical system allows easy modulation of amplitude, frequency and center of oscillations. Once the robot is performing the learned trajectory, we can change all of these by changing just one parameter for each. The system is designed to permit on-line modulations of the originally learned trajectories. This is one of the important motivations behind the use of dynamical systems to encode trajectories.

One type of such feedback is the "slow-down-feedback" that can be applied to the Output Dynamical System. This type of feedback affects both the Canonical and the Output Dynamical System. The following explanation is for the replay of a learned trajectory as

<sup>15</sup> Performing Periodic Tasks: On-Line Learning,

For the process of repeating the signal, for which we use a phase oscillator, we modify Eqs. (2

With this type of feedback, the time evolution of the states is gradually halted during the perturbation. The desired position *y* is modified to remain close to the actual position *y*˜, and as soon as the perturbation stops, rapidly resumes performing the time-delayed planned trajectory. Results are presented in Figure 7 left. As we can see, the desired position *y* and the actual position *y*˜ are the same except for the short interval between *t* = 22.2 s and *t* = 23.9 s. The dotted line corresponds to the original unperturbed trajectory. The desired trajectory continues from the point of perturbation and does not jump to the unperturbed

Another example of a perturbation can be the presence of boundaries or obstacles, such as joint angle limits. In that case we can modify the Eq. (2) to include a repulsive force *l*(*y*) at the

For instance, a simple repulsive force to avoid hitting joint limits or going beyond a position

where *yL* is the value of the limit. Figure 7 right illustrates the effect of such a repulsive force. Such on-line modifications are one of the most interesting properties of using autonomous differential equations for control policies. These are just examples of possible feedback loops,

1 (*yL* − *y*)

y

Fig. 7. *Left*: Reacting to a perturbation with a slow-down feedback. The desired position *y* and the actual position *y*˜ are the same except for the short interval between *t* = 22.2 s and *t* = 23.9 s. The dotted line corresponds to the original unperturbed trajectory. *Right*: Output

*l*(*y*) = −*γ*

and they should be adjusted depending on the task at hand.

0.5 1 Ψi

> 2 4 6

mod(Φ, 2π) <sup>21</sup> <sup>22</sup> <sup>23</sup> <sup>24</sup> <sup>25</sup> <sup>26</sup> <sup>0</sup>

<sup>21</sup> <sup>22</sup> <sup>23</sup> <sup>24</sup> <sup>25</sup> <sup>26</sup> <sup>0</sup>

of the system with the limits set to *yl* = [−1, 1] for the input signal

<sup>Φ</sup>˙ <sup>=</sup> <sup>Ω</sup>

*z* + *αpy* (*y*˜ − *y*)

(22)

<sup>1</sup> <sup>+</sup> *<sup>α</sup>p*Φ|*y*˜ <sup>−</sup> *<sup>y</sup>*<sup>|</sup> (23)

*y*˙ = Ω (*z* + *l*(*y*)) (24)

<sup>t</sup> [s] <sup>10</sup> 10.2 10.4 10.6 10.8 <sup>11</sup> 11.2 11.4 11.6 11.8 <sup>12</sup> −2

t [s]

<sup>3</sup> (25)

Input Output limit

perturbing the robot while learning the trajectory is not practical.

where *αpy* and *αp*<sup>Φ</sup> are positive constants.

Adaptation and Synchronization with External Signals

*y*˙ = Ω

and 14) to:

desired trajectory.

in task space can be

<sup>21</sup> <sup>22</sup> <sup>23</sup> <sup>24</sup> <sup>25</sup> <sup>26</sup> −2

<sup>21</sup> <sup>22</sup> <sup>23</sup> <sup>24</sup> <sup>25</sup> <sup>26</sup> <sup>0</sup>

t [s]

*ydemo*(*t*) = cos (2*πt*) + sin (4*πt*).

limit by:

−1 0 1 2 y


2

**3.2.3 Virtual repulsive force**

Changing the parameter *g* corresponds to a modulation of the baseline of the rhythmic movement. This will smoothly shift the oscillation without modifying the signal shape. The results are presented in the second plot in Figure 6 left. Modifying Ω and *r* corresponds to the changing of the frequency and the amplitude of the oscillations, respectively. Since our differential equations are of second order, these abrupt changes of parameters result in smooth variations of the trajectory *y*. This is particularly useful when controlling articulated robots, which require trajectories with limited jerks. Changing of the parameter Ω only comes into consideration when one wants to repeat the learned signal at a desired frequency that is different from the one we adapted to with our Canonical Dynamical System. Results of changing the frequency Ω are presented in the third plot of Figure 6 left. Results of modulating the amplitude parameter *r* are presented in the bottom plot of Figure 6 left.

#### **3.2 Perturbations and modified feedback**

#### **3.2.1 Dealing with perturbations**

The Output Dynamical System is inherently robust against perturbations. Figure 6 right illustrates the time evolution of the system repeating a learned trajectory at the frequency of 1 Hz, when the state variables *y*, *z* and Φ are randomly changed at time *t* = 30 *s*. From the results we can see that the output of the system reverts smoothly to the learned trajectory. This is an important feature of the approach: the system essentially represents a whole landscape in the space of state variables which not only encode the learned trajectory but also determine how the states return to it after a perturbation.

#### **3.2.2 Slow-down feedback**

When controlling the robot, we have to take into account perturbations due to the interaction with the environment. Our system provides *desired* states to the robot, i.e. desired joint angles or torques, and its state variables are therefore not affected by the *actual* states of the robot, unless feedback terms are added to the control scheme. For instance, it might happen that, due to external forces, significant differences arise between the actual position *y*˜ and the desired position *y*. Depending on the task, this error can be fed back to the system in order to modify on-line the generated trajectories.

Fig. 6. *Left*: Modulations of the learned signal. The learned signal (top), modulating the baseline for oscillations *g* (second from top), doubling the frequency Ω (third from top), doubling the amplitude *r* (bottom). *Right*: Dealing with perturbations – reacting to a random perturbation of the state variables *y*, *z* and Φ at *t* = 30 s.

12 Will-be-set-by-IN-TECH

Changing the parameter *g* corresponds to a modulation of the baseline of the rhythmic movement. This will smoothly shift the oscillation without modifying the signal shape. The results are presented in the second plot in Figure 6 left. Modifying Ω and *r* corresponds to the changing of the frequency and the amplitude of the oscillations, respectively. Since our differential equations are of second order, these abrupt changes of parameters result in smooth variations of the trajectory *y*. This is particularly useful when controlling articulated robots, which require trajectories with limited jerks. Changing of the parameter Ω only comes into consideration when one wants to repeat the learned signal at a desired frequency that is different from the one we adapted to with our Canonical Dynamical System. Results of changing the frequency Ω are presented in the third plot of Figure 6 left. Results of modulating

The Output Dynamical System is inherently robust against perturbations. Figure 6 right illustrates the time evolution of the system repeating a learned trajectory at the frequency of 1 Hz, when the state variables *y*, *z* and Φ are randomly changed at time *t* = 30 *s*. From the results we can see that the output of the system reverts smoothly to the learned trajectory. This is an important feature of the approach: the system essentially represents a whole landscape in the space of state variables which not only encode the learned trajectory but also determine

When controlling the robot, we have to take into account perturbations due to the interaction with the environment. Our system provides *desired* states to the robot, i.e. desired joint angles or torques, and its state variables are therefore not affected by the *actual* states of the robot, unless feedback terms are added to the control scheme. For instance, it might happen that, due to external forces, significant differences arise between the actual position *y*˜ and the desired position *y*. Depending on the task, this error can be fed back to the system in order to modify

<sup>t</sup> [s] <sup>28</sup> <sup>29</sup> <sup>30</sup> <sup>31</sup> <sup>32</sup> −10

Fig. 6. *Left*: Modulations of the learned signal. The learned signal (top), modulating the baseline for oscillations *g* (second from top), doubling the frequency Ω (third from top), doubling the amplitude *r* (bottom). *Right*: Dealing with perturbations – reacting to a random

y

t [s]

<sup>28</sup> <sup>29</sup> <sup>30</sup> <sup>31</sup> <sup>32</sup> <sup>0</sup>

t [s]

2 4 6

Φ [rad]

the amplitude parameter *r* are presented in the bottom plot of Figure 6 left.

**3.2 Perturbations and modified feedback**

how the states return to it after a perturbation.

<sup>23</sup> 23.5 <sup>24</sup> 24.5 <sup>25</sup> 25.5 <sup>26</sup> 26.5 <sup>27</sup> 27.5 <sup>28</sup> −10

<sup>23</sup> 23.5 <sup>24</sup> 24.5 <sup>25</sup> 25.5 <sup>26</sup> 26.5 <sup>27</sup> 27.5 <sup>28</sup> −10

<sup>23</sup> 23.5 <sup>24</sup> 24.5 <sup>25</sup> 25.5 <sup>26</sup> 26.5 <sup>27</sup> 27.5 <sup>28</sup> −10

23 23.5 24 24.5 25 25.5 26 26.5 27 27.5 28

perturbation of the state variables *y*, *z* and Φ at *t* = 30 s.

**3.2.1 Dealing with perturbations**

**3.2.2 Slow-down feedback**

on-line the generated trajectories.

0 10 yout

0 10 yout

−10 0 10 yout

0 10 yout

One type of such feedback is the "slow-down-feedback" that can be applied to the Output Dynamical System. This type of feedback affects both the Canonical and the Output Dynamical System. The following explanation is for the replay of a learned trajectory as perturbing the robot while learning the trajectory is not practical.

For the process of repeating the signal, for which we use a phase oscillator, we modify Eqs. (2 and 14) to:

$$
\dot{y} = \Omega \left( z + a\_{py} \left( \tilde{y} - y \right) \right) \tag{22}
$$

$$\dot{\Phi} = \frac{\Omega}{1 + \alpha\_{p\Phi}|\vec{y} - y|}\tag{23}$$

where *αpy* and *αp*<sup>Φ</sup> are positive constants.

With this type of feedback, the time evolution of the states is gradually halted during the perturbation. The desired position *y* is modified to remain close to the actual position *y*˜, and as soon as the perturbation stops, rapidly resumes performing the time-delayed planned trajectory. Results are presented in Figure 7 left. As we can see, the desired position *y* and the actual position *y*˜ are the same except for the short interval between *t* = 22.2 s and *t* = 23.9 s. The dotted line corresponds to the original unperturbed trajectory. The desired trajectory continues from the point of perturbation and does not jump to the unperturbed desired trajectory.

#### **3.2.3 Virtual repulsive force**

Another example of a perturbation can be the presence of boundaries or obstacles, such as joint angle limits. In that case we can modify the Eq. (2) to include a repulsive force *l*(*y*) at the limit by:

$$
\dot{y} = \Omega \left( z + l(y) \right) \tag{24}
$$

For instance, a simple repulsive force to avoid hitting joint limits or going beyond a position in task space can be

$$M(y) = -\gamma \frac{1}{\left(y\_L - y\right)^3} \tag{25}$$

where *yL* is the value of the limit. Figure 7 right illustrates the effect of such a repulsive force. Such on-line modifications are one of the most interesting properties of using autonomous differential equations for control policies. These are just examples of possible feedback loops, and they should be adjusted depending on the task at hand.

Fig. 7. *Left*: Reacting to a perturbation with a slow-down feedback. The desired position *y* and the actual position *y*˜ are the same except for the short interval between *t* = 22.2 s and *t* = 23.9 s. The dotted line corresponds to the original unperturbed trajectory. *Right*: Output of the system with the limits set to *yl* = [−1, 1] for the input signal *ydemo*(*t*) = cos (2*πt*) + sin (4*πt*).

<sup>0</sup> <sup>10</sup> <sup>20</sup> <sup>30</sup> <sup>40</sup> <sup>50</sup> <sup>60</sup> <sup>0</sup>

<sup>17</sup> Performing Periodic Tasks: On-Line Learning,

<sup>0</sup> <sup>10</sup> <sup>20</sup> <sup>30</sup> <sup>40</sup> <sup>50</sup> <sup>60</sup> −1

xl xr

<sup>25</sup> <sup>30</sup> <sup>35</sup> −1

In this section we show how we can use the proposed two-layered system to modify already learned movement trajectories according to the measured force. ARMAR-IIIb humanoid robot, which is kinematically equal to the ARMAR-IIIa (Asfour et al., 2006) was used in the

From the kinematics point of view, the robot consists of seven subsystems: head, left arm, right arm, left hand, right hand, torso, and a mobile platform. The head has seven DOF and is equipped with two eyes, which have a common tilt and can pan independently. Each arm has 7 DOF and each hand additional 8DOF. The locomotion of the robot is realized using a

In order to obtain reliable motion data of a human wiping demonstration through observation by the robot, we exploited the color features of the sponge to track its motion. Using the stereo camera setup of the robot, the implemented blob tracking algorithm based on color segmentation and a particle filter framework provides a robust location estimation of the

For learning of movements we first define the area of demonstration by measuring the lower-left and the upper-right position within a given time-frame, as is presented in Fig. 11. All tracked sponge-movement is then normalized and given as offset to the central position

For measuring the contact forces between the object in the hand and the surface of the plane a

Learning of a movement that brings the robot into contact with the environment must be based on force control, otherwise there can be damage to the robot or the object to which the robot applies its force. In the task of wiping a table, or any other object of arbitrary shape,

sponge in 3D. The resulting trajectories were captured with a frame rate of 30 Hz.

6D-force/torque sensor is used, which is mounted at the wrist of the robot.

**3.5 Adaptation of the learned trajectory using force feedback**

Fig. 10. Extracted frequency Ω of the drumming tones from the music in the top plot. Comparison between the power spectrum of the audio signal (drumming tones) *y* and robot

t [s]

2 4 6

0

0

trajectories for the left (*xl*) and the right hand motions (*xr*).

**3.4 Table wiping**

experiment.

of this area.

wheel-based holonomic platform.

x, y 1

x, y 1

Ω [rad]

Adaptation and Synchronization with External Signals
