**2. Two-layered movement imitation system**

In this chapter we give details and properties of both sub-systems that make the two-layered movement imitation system . We also give alternative possibilities for the canonical dynamical system.

*z*˙ = Ω

Adaptation and Synchronization with External Signals

is Ω and whose waveform is determined by the weight parameters *wi*.

*ci* are equally spaced between 0 and 2*π* in *N* steps.

function *ftarg* given by *ftarg* = <sup>1</sup>

matching *<sup>y</sup>* to *ydemo*, *<sup>z</sup>* to *<sup>y</sup>*˙*demo*

are trying to learn. With this Eq. (1) can be rewritten as

1

vector *wi*, which minimizes the quadratic error criterion <sup>2</sup>

standard weighted least squares approaches.

*<sup>α</sup><sup>z</sup>* (*β<sup>z</sup>* (*<sup>g</sup>* <sup>−</sup> *<sup>y</sup>*) <sup>−</sup> *<sup>z</sup>*) <sup>+</sup> <sup>∑</sup>*<sup>N</sup>*

<sup>7</sup> Performing Periodic Tasks: On-Line Learning,

Here Ω (chosen amongst the *ωi*) is the frequency given by canonical dynamical system, Eq. (10), *α<sup>Z</sup>* and *β<sup>z</sup>* are positive constants, set to *α<sup>z</sup>* = 8 and *β<sup>z</sup>* = 2 for all the results; the ratio 4:1 ensures critical damping so that the system monotonically varies to the trajectory oscillating around *g* - an anchor point for the oscillatory trajectory. *N* is the number of Gaussian-like periodic kernel functions Ψ*i*, which are given by Eq. (3). *wi* is the learned weight parameter and *r* is the amplitude control parameter, maintaining the amplitude of the demonstration signal with *r* = 1. The system given by Eq. (1) without the nonlinear term is a second-order linear system with a unique globally stable point attractor (Ijspeert et al., 2002). But, because of the periodic nonlinear term, this system produces stable periodic trajectories whose frequency

In Eq. (3), which determines the Gaussian-like kernel functions Ψ*i*, *h* determines their width, which is set to *h* = 2.5*N* for all the results presented in the paper unless stated otherwise, and

As the input into the learning algorithm we use triplets of position, velocity and acceleration *ydemo*(*t*), *y*˙*demo*(*t*), and *y*¨*demo*(*t*) with *demo* marking the input or demonstration trajectory we

and formulated as a supervised learning problem with on the right hand side a set of local models *wir* that are weighted by the kernel functions Ψ*i*, and on the left hand side the target

Locally weighted regression corresponds to finding, for each kernel function Ψ*i*, the weight

where *t* is an index corresponding to discrete time steps (of the integration). The regression can be performed as a *batch* regression, or alternatively, we can perform the minimization of the *Ji* cost function incrementally, while the target data points *ftarg*(*t*) arrive. As we want continuous learning of the demonstration signal, we use the latter. Incremental regression is done with the use of recursive least squares with a forgetting factor of *λ*, to determine the

<sup>2</sup> LWR is derived from a piecewise linear function approximation approach (Schaal & Atkeson, 1998), which decouples a nonlinear least-squares learning problem into several locally linear learning problems, each characterized by the local cost function *Ji*. These local problems can be solved with

<sup>Ω</sup> .

Ψ*i*(*t*) 

parameters (or weights) *wi*. Given the target data *ftarg*(*t*) and *r*(*t*), *wi* is updated by

<sup>Ω</sup> *<sup>z</sup>*˙ <sup>−</sup> *<sup>α</sup><sup>z</sup>* (*β<sup>z</sup>* (*<sup>g</sup>* <sup>−</sup> *<sup>y</sup>*) <sup>−</sup> *<sup>z</sup>*) <sup>=</sup> <sup>∑</sup>*<sup>N</sup>*

<sup>Ω</sup><sup>2</sup> *<sup>y</sup>*¨*demo* <sup>−</sup> *<sup>α</sup><sup>z</sup>*

*P* ∑ *t*=1

<sup>Ω</sup> , and *<sup>z</sup>*˙ to *<sup>y</sup>*¨*demo*

*Ji* =

*<sup>i</sup>*=<sup>1</sup> Ψ*iwir* ∑*N <sup>i</sup>*=<sup>1</sup> Ψ*<sup>i</sup>*

Ψ*<sup>i</sup>* = exp (*h* (cos (Φ − *ci*) − 1)) (3)

*<sup>i</sup>*=<sup>1</sup> Ψ*iwir* ∑*N <sup>i</sup>*=<sup>1</sup> Ψ*<sup>i</sup>*

2

*wi*(*t* + 1) = *wi*(*t*) + Ψ*iPi*(*t* + 1)*r*(*t*)*er*(*t*) (6)

<sup>Ω</sup> *<sup>y</sup>*˙*demo*

*<sup>β</sup><sup>z</sup>* (*<sup>g</sup>* <sup>−</sup> *ydemo*) <sup>−</sup> <sup>1</sup>

*ftarg*(*t*) − *wir*(*t*)

*y*˙ = Ω*z* (2)

(1)

(4)

(5)

, which is obtained by

Fig. 1. Proposed structure of the system. The two-layered system is composed of the *Canonical Dynamical System* as the first layer for the frequency adaptation, and the *Output Dynamical System* for the learning as the second layer. The input signal *ydemo*(*t*) is an arbitrary *Q*-dimensional periodic signal. The Canonical Dynamical System outputs the fundamental frequency Ω and phase of the oscillator at that frequency, Φ, for each of the *Q* DOF, and the Output Dynamical System learns the waveform.

Figure 1 shows the structure of the proposed system for the learning of the frequency and the waveform of the input signal. The input into the system *ydemo*(*t*) is an arbitrary periodic signal of one or more degrees of freedom (DOF).

The task of frequency and waveform learning is split into two separate tasks, each performed by a separate dynamical system. The frequency adaptation is performed by the *Canonical Dynamical System*, which either consists of several adaptive frequency oscillators in a feedback structure, or a single oscillator with an adaptive Fourier series. Its purpose is to extract the basic frequency Ω of the input signal, and to provide the phase Φ of the signal at this frequency.

These quantities are fed into the *Output Dynamical System*, whose goal is to adapt the shape of the limit cycle of the Canonical Dynamical System, and to learn the waveform of the input signal. The resulting output signal of the Output Dynamical System is not explicitly encoded but generated during the time evolution of the Canonical Dynamical System, by using a set of weights learned by Incremental Locally Weighted Regression (ILWR) (Schaal & Atkeson, 1998).

Both frequency adaptation and waveform learning work in parallel, thus accelerating the process. The output of the combined system can be, for example, joint coordinates of the robot, position in task space, joint torques, etc., depending on what the input signal represents.

In the next section we first explain the second layer of the system - the output dynamical system - which learns the waveform of the input periodic signal once the frequency is determined.

### **2.1 Output dynamical system**

The output dynamical system is used to learn the waveform of the input signal. The explanation is for a 1 DOF signal. For multiple DOF, the algorithm works in parallel for all the degrees of freedom.

The following dynamics specify the attractor landscape of a trajectory *y* towards the anchor point *g*, with the Canonical Dynamical System providing the phase Φ to the function Ψ*<sup>i</sup>* of the control policy:

4 Will-be-set-by-IN-TECH

canonical dynamical system

two-layered system

Fig. 1. Proposed structure of the system. The two-layered system is composed of the *Canonical Dynamical System* as the first layer for the frequency adaptation, and the *Output Dynamical System* for the learning as the second layer. The input signal *ydemo*(*t*) is an arbitrary *Q*-dimensional periodic signal. The Canonical Dynamical System outputs the fundamental frequency Ω and phase of the oscillator at that frequency, Φ, for each of the *Q*

Figure 1 shows the structure of the proposed system for the learning of the frequency and the waveform of the input signal. The input into the system *ydemo*(*t*) is an arbitrary periodic

The task of frequency and waveform learning is split into two separate tasks, each performed by a separate dynamical system. The frequency adaptation is performed by the *Canonical Dynamical System*, which either consists of several adaptive frequency oscillators in a feedback structure, or a single oscillator with an adaptive Fourier series. Its purpose is to extract the basic frequency Ω of the input signal, and to provide the phase Φ of the signal at this

These quantities are fed into the *Output Dynamical System*, whose goal is to adapt the shape of the limit cycle of the Canonical Dynamical System, and to learn the waveform of the input signal. The resulting output signal of the Output Dynamical System is not explicitly encoded but generated during the time evolution of the Canonical Dynamical System, by using a set of weights learned by Incremental Locally Weighted Regression (ILWR) (Schaal & Atkeson,

Both frequency adaptation and waveform learning work in parallel, thus accelerating the process. The output of the combined system can be, for example, joint coordinates of the robot, position in task space, joint torques, etc., depending on what the input signal represents. In the next section we first explain the second layer of the system - the output dynamical system - which learns the waveform of the input periodic signal once the frequency is

The output dynamical system is used to learn the waveform of the input signal. The explanation is for a 1 DOF signal. For multiple DOF, the algorithm works in parallel for all the

The following dynamics specify the attractor landscape of a trajectory *y* towards the anchor point *g*, with the Canonical Dynamical System providing the phase Φ to the function Ψ*<sup>i</sup>* of the

*ydemo*

signal of one or more degrees of freedom (DOF).

frequency.

1998).

determined.

**2.1 Output dynamical system**

degrees of freedom.

control policy:

DOF, and the Output Dynamical System learns the waveform.

... *Q*

Ω1..*<sup>Q</sup>*

Φ1..*<sup>Q</sup>*

output dynamical system

... *Q*

*y1...Q*

*w1...Q*

$$\dot{z} = \Omega \left( a\_z \left( \beta\_z \left( g - y \right) - z \right) + \frac{\sum\_{i=1}^{N} \Psi\_i w\_l r}{\sum\_{i=1}^{N} \Psi\_i} \right) \tag{1}$$

$$
\dot{y} = \Omega z \tag{2}
$$

$$\Psi\_i = \exp\left(h\left(\cos\left(\Phi - c\_i\right) - 1\right)\right) \tag{3}$$

Here Ω (chosen amongst the *ωi*) is the frequency given by canonical dynamical system, Eq. (10), *α<sup>Z</sup>* and *β<sup>z</sup>* are positive constants, set to *α<sup>z</sup>* = 8 and *β<sup>z</sup>* = 2 for all the results; the ratio 4:1 ensures critical damping so that the system monotonically varies to the trajectory oscillating around *g* - an anchor point for the oscillatory trajectory. *N* is the number of Gaussian-like periodic kernel functions Ψ*i*, which are given by Eq. (3). *wi* is the learned weight parameter and *r* is the amplitude control parameter, maintaining the amplitude of the demonstration signal with *r* = 1. The system given by Eq. (1) without the nonlinear term is a second-order linear system with a unique globally stable point attractor (Ijspeert et al., 2002). But, because of the periodic nonlinear term, this system produces stable periodic trajectories whose frequency is Ω and whose waveform is determined by the weight parameters *wi*.

In Eq. (3), which determines the Gaussian-like kernel functions Ψ*i*, *h* determines their width, which is set to *h* = 2.5*N* for all the results presented in the paper unless stated otherwise, and *ci* are equally spaced between 0 and 2*π* in *N* steps.

As the input into the learning algorithm we use triplets of position, velocity and acceleration *ydemo*(*t*), *y*˙*demo*(*t*), and *y*¨*demo*(*t*) with *demo* marking the input or demonstration trajectory we are trying to learn. With this Eq. (1) can be rewritten as

$$\frac{1}{\Omega}\dot{z} - a\_z \left(\beta\_z \left(g - y\right) - z\right) = \frac{\sum\_{i=1}^{N} \Psi\_i w\_i r}{\sum\_{i=1}^{N} \Psi\_i} \tag{4}$$

and formulated as a supervised learning problem with on the right hand side a set of local models *wir* that are weighted by the kernel functions Ψ*i*, and on the left hand side the target function *ftarg* given by *ftarg* = <sup>1</sup> <sup>Ω</sup><sup>2</sup> *<sup>y</sup>*¨*demo* <sup>−</sup> *<sup>α</sup><sup>z</sup> <sup>β</sup><sup>z</sup>* (*<sup>g</sup>* <sup>−</sup> *ydemo*) <sup>−</sup> <sup>1</sup> <sup>Ω</sup> *<sup>y</sup>*˙*demo* , which is obtained by matching *<sup>y</sup>* to *ydemo*, *<sup>z</sup>* to *<sup>y</sup>*˙*demo* <sup>Ω</sup> , and *<sup>z</sup>*˙ to *<sup>y</sup>*¨*demo* <sup>Ω</sup> .

Locally weighted regression corresponds to finding, for each kernel function Ψ*i*, the weight vector *wi*, which minimizes the quadratic error criterion <sup>2</sup>

$$J\_i = \sum\_{t=1}^{p} \Psi\_i(t) \left( f\_{\text{target}}(t) - w\_i r(t) \right)^2 \tag{5}$$

where *t* is an index corresponding to discrete time steps (of the integration). The regression can be performed as a *batch* regression, or alternatively, we can perform the minimization of the *Ji* cost function incrementally, while the target data points *ftarg*(*t*) arrive. As we want continuous learning of the demonstration signal, we use the latter. Incremental regression is done with the use of recursive least squares with a forgetting factor of *λ*, to determine the parameters (or weights) *wi*. Given the target data *ftarg*(*t*) and *r*(*t*), *wi* is updated by

$$w\_i(t+1) = w\_i(t) + \Psi\_i P\_i(t+1) r(t) e\_r(t) \tag{6}$$

<sup>2</sup> LWR is derived from a piecewise linear function approximation approach (Schaal & Atkeson, 1998), which decouples a nonlinear least-squares learning problem into several locally linear learning problems, each characterized by the local cost function *Ji*. These local problems can be solved with standard weighted least squares approaches.

number of kernel functions is increased.3 Figure 2 right shows the error of learning *er* when using *N* = 10, *N* = 25, and *N* = 50 on a signal *ydemo*(*t*) = 0.65sin (2*πt*) + 1.5cos (4*πt*) +

<sup>9</sup> Performing Periodic Tasks: On-Line Learning,

The forgetting factor *λ* ∈ [0, 1] plays a key role in the behavior of the system. If it is set high, the system never forgets any input values and learns an average of the waveform over multiple periods. If it is set too low, it forgets all, basically training all the weights to the last

The task of the Canonical Dynamical System is two-fold. Firstly, it has to extract the fundamental frequency Ω of the input signal, and secondly, it has to exhibit stable limit cycle behavior in order to provide a phase signal Φ, that is used to anchor the waveform of the output signal. Two approaches are possible, either with a pool of oscillators (PO), or with an

As the basis of our canonical dynamical system we use a set of phase oscillators, see e.g. (Buchli et al., 2006), to which we apply the adaptive frequency learning rule as introduced in (Buchli & Ijspeert, 2004) and (Righetti & Ijspeert, 2006), and combine it with a feedback structure (Righetti et al., 2006) shown in Figure 3. The basic idea of the structure is that each of the oscillators will adapt its frequency to one of the frequency components of the input signal,

We use several oscillators, but are interested only in the fundamental or lowest non-zero frequency of the input signal, denoted by Ω, and the phase of the oscillator at this frequency, denoted by Φ. Therefore the feedback structure is followed by a small logical block, which chooses the correct, lowest non-zero, frequency. Determining Ω and Φ is important because with them we can formulate a supervised learning problem in the second stage - the Output

*ydemo <sup>e</sup>* <sup>Σ</sup>*αi i* cos( ) <sup>ɸ</sup> -

*ω*1 1 ( ), *t* ɸ

lowest non-zero *Ω,*Φ

*ω*2 2 ( ), *t* ɸ

*ω*3 3 ( ), *t* ɸ

*ωM M* ( ), *t* ɸ

Fig. 3. Feedback structure of a network of adaptive frequency phase oscillators, that form the Canonical Dynamical System. All oscillators receive the same input and have to be at different starting frequencies to converge to different final frequencies. Refer also to text and

The feedback structure of *M* adaptive frequency phase oscillators is governed by the following

<sup>3</sup> This property is due to solving the bias-variance dilemma of function approximation locally with a

closed form solution to leave-one-out cross-validation (Schaal & Atkeson, 1998).

Dynamical System, and learn the waveform of the full period of the input signal.

+

*y* ^

0.3sin (6*πt*). Throughout the paper, unless specified otherwise, *N* = 25.

value. We set it to *λ* = 0.995.

adaptive Fourier Series (AF).

Eqs. (9-13).

equations:

**2.2.1 Using a pool of oscillators**

essentially "populating" the frequency spectrum.

**2.2 Canonical dynamical system**

Adaptation and Synchronization with External Signals

Fig. 2. *Left*: The result of Output Dynamical System with a constant frequency input and with continuous learning of the weights. In all the plots the input signal is the dash-dot line while the learned signal is the solid line. In the middle-right plot we can see the evolution of the kernel functions. The kernel functions are a function of Φ and do not necessarily change uniformly (see also Fig. 7). In the bottom right plot the phase of the oscillator is shown. The amplitude is here *r* = 1, as shown bottom-left. *Right*: The error of learning decreases with the increase of the number of Gaussian-like kernel functions. The error, which is quite small, is mainly due to a very slight (one or two sample times) delay of the learned signal.

$$P\_i(t+1) = \frac{1}{\lambda} \left( P\_i(t) - \frac{P\_i(t)^2 r(t)^2}{\frac{\lambda}{\Psi\_i} + P\_i(t)r(t)^2} \right) \tag{7}$$

$$e\_r(t) = f\_{\text{target}}(t) - w\_i(t)r(t). \tag{8}$$

*P*, in general, is the inverse covariance matrix (Ljung & Söderström, 1986). The recursion is started with *wi* = 0 and *Pi* = 1. Batch and incremental learning regressions provide identical weights *wi* for the same training sets when the forgetting factor *λ* is set to one. Differences appear when the forgetting factor is less than one, in which case the incremental regression gives more weight to recent data (i.e. tends to forget older ones). The error of weight learning *er* (Eq. (8)) is not "related" to *e* when extracting frequency components (Eq. (11)). This allows for complete separation of frequency adaptation and waveform learning.

Figure 2 left shows the time evolution of the Output Dynamical System anchored to a Canonical Dynamical System with the frequency set at Ω = 2*π* rad/s, and the weight parameters *wi* adjusted to fit the trajectory *ydemo*(*t*) = sin (2*πt*) + cos (4*πt*) + 0.4sin(6*πt*). As we can see in the top-left plot, the input signal and the reconstructed signal match closely. The matching between the reconstructed signal and the input signal can be improved by increasing the number of Gaussian-like functions.

#### *Parameters of the Output Dynamical System*

When tuning the parameters of the Output Dynamical System, we have to determine the number of Gaussian-like Kernel functions *N*, and specially the forgetting factor *λ*. The number *N* of Gaussian-like kernel functions could be set automatically if we used the locally weighted learning (Schaal & Atkeson, 1998), but for simplicity it was here set by hand. Increasing the number increases the accuracy of the reconstructed signal, but at the same time also increases the computational cost. Note that LWR does not suffer from problems of overfitting when the number of kernel functions is increased.3 Figure 2 right shows the error of learning *er* when using *N* = 10, *N* = 25, and *N* = 50 on a signal *ydemo*(*t*) = 0.65sin (2*πt*) + 1.5cos (4*πt*) + 0.3sin (6*πt*). Throughout the paper, unless specified otherwise, *N* = 25.

The forgetting factor *λ* ∈ [0, 1] plays a key role in the behavior of the system. If it is set high, the system never forgets any input values and learns an average of the waveform over multiple periods. If it is set too low, it forgets all, basically training all the weights to the last value. We set it to *λ* = 0.995.

### **2.2 Canonical dynamical system**

6 Will-be-set-by-IN-TECH

0 0.2 0.4 0.6

*Pi*(*t*) <sup>−</sup> *Pi*(*t*)2*r*(*t*)<sup>2</sup> *λ*

<sup>Ψ</sup>*<sup>i</sup>* <sup>+</sup> *Pi*(*t*)*r*(*t*)<sup>2</sup>

*er*(*t*) = *ftarg*(*t*) − *wi*(*t*)*r*(*t*). (8)


Fig. 2. *Left*: The result of Output Dynamical System with a constant frequency input and with continuous learning of the weights. In all the plots the input signal is the dash-dot line while the learned signal is the solid line. In the middle-right plot we can see the evolution of the kernel functions. The kernel functions are a function of Φ and do not necessarily change uniformly (see also Fig. 7). In the bottom right plot the phase of the oscillator is shown. The amplitude is here *r* = 1, as shown bottom-left. *Right*: The error of learning decreases with the increase of the number of Gaussian-like kernel functions. The error, which is quite small, is

*P*, in general, is the inverse covariance matrix (Ljung & Söderström, 1986). The recursion is started with *wi* = 0 and *Pi* = 1. Batch and incremental learning regressions provide identical weights *wi* for the same training sets when the forgetting factor *λ* is set to one. Differences appear when the forgetting factor is less than one, in which case the incremental regression gives more weight to recent data (i.e. tends to forget older ones). The error of weight learning *er* (Eq. (8)) is not "related" to *e* when extracting frequency components (Eq. (11)). This allows

Figure 2 left shows the time evolution of the Output Dynamical System anchored to a Canonical Dynamical System with the frequency set at Ω = 2*π* rad/s, and the weight parameters *wi* adjusted to fit the trajectory *ydemo*(*t*) = sin (2*πt*) + cos (4*πt*) + 0.4sin(6*πt*). As we can see in the top-left plot, the input signal and the reconstructed signal match closely. The matching between the reconstructed signal and the input signal can be improved by increasing

When tuning the parameters of the Output Dynamical System, we have to determine the number of Gaussian-like Kernel functions *N*, and specially the forgetting factor *λ*. The number *N* of Gaussian-like kernel functions could be set automatically if we used the locally weighted learning (Schaal & Atkeson, 1998), but for simplicity it was here set by hand. Increasing the number increases the accuracy of the reconstructed signal, but at the same time also increases the computational cost. Note that LWR does not suffer from problems of overfitting when the

2

<sup>20</sup> 20.2 20.4 20.6 20.8 <sup>21</sup> 21.2 21.4 21.6 21.8 <sup>22</sup> −0.2

t [s]

N = 10 N = 25 N = 50

(7)

<sup>10</sup> 10.5 <sup>11</sup> 11.5 <sup>12</sup> −400

<sup>10</sup> 10.5 <sup>11</sup> 11.5 <sup>12</sup> <sup>0</sup>

<sup>10</sup> 10.5 <sup>11</sup> 11.5 <sup>12</sup> <sup>0</sup>

t [s]

mainly due to a very slight (one or two sample times) delay of the learned signal.

*λ*

for complete separation of frequency adaptation and waveform learning.

*Pi*(*<sup>t</sup>* <sup>+</sup> <sup>1</sup>) = <sup>1</sup>

−200 0 200 400 ¨y

> 0.5 1 Ψi

> > 2 4 6

mod(Φ, 2π)

<sup>10</sup> 10.5 <sup>11</sup> 11.5 <sup>12</sup> −2

<sup>10</sup> 10.5 <sup>11</sup> 11.5 <sup>12</sup> −20

<sup>10</sup> 10.5 <sup>11</sup> 11.5 <sup>12</sup> <sup>0</sup>

t [s]

the number of Gaussian-like functions. *Parameters of the Output Dynamical System*

−1 0 1 2 y

−10 0 10 20 ˙y

> 0.5 1 1.5 2 r

The task of the Canonical Dynamical System is two-fold. Firstly, it has to extract the fundamental frequency Ω of the input signal, and secondly, it has to exhibit stable limit cycle behavior in order to provide a phase signal Φ, that is used to anchor the waveform of the output signal. Two approaches are possible, either with a pool of oscillators (PO), or with an adaptive Fourier Series (AF).

#### **2.2.1 Using a pool of oscillators**

As the basis of our canonical dynamical system we use a set of phase oscillators, see e.g. (Buchli et al., 2006), to which we apply the adaptive frequency learning rule as introduced in (Buchli & Ijspeert, 2004) and (Righetti & Ijspeert, 2006), and combine it with a feedback structure (Righetti et al., 2006) shown in Figure 3. The basic idea of the structure is that each of the oscillators will adapt its frequency to one of the frequency components of the input signal, essentially "populating" the frequency spectrum.

We use several oscillators, but are interested only in the fundamental or lowest non-zero frequency of the input signal, denoted by Ω, and the phase of the oscillator at this frequency, denoted by Φ. Therefore the feedback structure is followed by a small logical block, which chooses the correct, lowest non-zero, frequency. Determining Ω and Φ is important because with them we can formulate a supervised learning problem in the second stage - the Output Dynamical System, and learn the waveform of the full period of the input signal.

Fig. 3. Feedback structure of a network of adaptive frequency phase oscillators, that form the Canonical Dynamical System. All oscillators receive the same input and have to be at different starting frequencies to converge to different final frequencies. Refer also to text and Eqs. (9-13).

The feedback structure of *M* adaptive frequency phase oscillators is governed by the following equations:

<sup>3</sup> This property is due to solving the bias-variance dilemma of function approximation locally with a closed form solution to leave-one-out cross-validation (Schaal & Atkeson, 1998).

$$
\dot{\phi}\_{\dot{l}} = \omega\_{\dot{l}} - Ke \sin(\phi\_{\dot{l}}) \tag{9}
$$

$$
\dot{\omega}\_{\dot{l}} = -Ke\sin(\phi\_{\dot{l}})\tag{10}
$$

$$e = y\_{demo} - \hat{y}\tag{11}$$

$$\hat{y} = \sum\_{i=1}^{m} a\_i \cos(\phi\_i) \tag{12}$$

$$
\dot{\mathfrak{a}}\_{i} = \eta \cos(\phi\_{i}) e \tag{13}
$$

−1

ω<sup>t</sup> ΩAF ΩP O

0

6

ΩAF [rad]

−0.5 0 0.5 y

0.04

error

Fig. 4. *Left*: Typical convergence of an adaptive frequency oscillator combined with an adaptive Fourier series (-) compared to a system with a poll of *i* oscillators (-.-). One oscillator is used in both cases. The input is a periodic signal (*y* = *sin*(*ωtt*), with *ω<sup>t</sup>* = (6*π* − *π*/5*t*) rad/s for *t* < 20 s, followed by *ω<sup>t</sup>* = 2*π* rad/s for *t* < 30 s, followed again by *ω<sup>t</sup>* = 5*π* rad/s for *t* < 45 s and finally *ω<sup>t</sup>* = 3*π* rad/s). Frequency adaptation is presented in the middle plot, starting at Ω<sup>0</sup> = *π* rad/s, where *ω<sup>t</sup>* is given by the dashed line and Ω by the solid line. The square error between the target and the extracted frequency is shown in the bottom plot. We can see that the adaptation is successful for non-stationary signals, step changes and stationary signals. *Right*: Comparison between using the PO and the AF approaches for the canonical dynamical system. The first plot shows the evolution of frequency distribution using a pool of 10 oscillators. The second plot shows the extracted frequency using the AF approach. The comparison of the target and the approximated signals is presented in the third plot. The thin solid line presents the input signal *ydemo*, the thick solid line presents the AF approach *y*ˆ and the dotted line presents the PO approach *y*ˆ*o*. The square difference between the input and the approximated signals is presented in the bottom plot.

frequencies components. A high number of oscillators can be used. Beside the almost negligible computational costs, using too many oscillators does not affect the solution. A practical problem that arises is that the oscillators' frequencies might come too close together, and then lock onto the same frequency component. To solve this we separate their initial frequencies *ω*<sup>0</sup> in a manner that suggests that (preferably only) one oscillator will go for the

With a high number of oscillators, many of them want to lock to the offset (0 Hz). With the target frequency under 1 rad/s the oscillations of the estimated frequency tend to be higher, which results in longer adaptation times. This makes choosing the fundamental frequency

offset, one will go for the highest frequency, and the others will "stay between".

<sup>20</sup> 20.5 <sup>21</sup> <sup>0</sup>

8

0 100 200 300 400 500

150 150.5 151 t [s]

350 350.5 351

20

ΩP O [rad]

<sup>11</sup> Performing Periodic Tasks: On-Line Learning,

40

<sup>0</sup> <sup>10</sup> <sup>20</sup> <sup>30</sup> <sup>40</sup> <sup>50</sup> <sup>60</sup> <sup>0</sup>

Adaptation and Synchronization with External Signals

t [s]

200

Error

Ω [rad] 0

ydemo

1

where K is the coupling strength, *φ<sup>i</sup>* is the phase of oscillator *i*, *e* is the input into the oscillators, *ydemo* is the input signal, *y*ˆ is the weighted sum of the oscillators' outputs, *M* is the number of oscillators, *α<sup>i</sup>* is the amplitude associated to the *i*-th oscillator, and *η* is a learning constant. In the experiments we use *K* = 20 and *η* = 1, unless specified otherwise.

Eq. (9) and (10) present the core of the Canonical Dynamical System – the adaptive frequency phase oscillator. Several (*M*) such oscillators are used in a feedback loop to extract separate frequency components. Eq. (11) and (12) specify the feedback loop, which needs also amplitude adaptation for each of the frequency components (Eq. (13)).

As we can see in Figure 3, each of the oscillators of the structure receives the same input signal, which is the difference between the signal to be learned and the signal already learned by the feedback loop, as in Eq. (11). Since a negative feedback loop is used, this difference approaches zero as the weighted sum of separate frequency components, Eq. (12), approaches the learned signal, and therefore the frequencies of the oscillators stabilize. Eq. (13) ensures amplitude adaptation and thus the stabilization of the learned frequency. Such a feedback structure performs a kind of dynamic Fourier analysis. It can learn several frequency components of the input signal (Righetti et al., 2006) and enables the frequency of a given oscillator to converge as *t* → ∞, because once the frequency of a separate oscillator is set, it is deducted from the demonstration signal *ydemo*, and disappears from *e* (due to the negative feedback loop). Other oscillators can thus adapt to other remaining frequency components. The populating of the frequency spectrum is therefore done without any signal processing, as the whole process of frequency extraction and adaptation is totally embedded into the dynamics of the adaptive frequency oscillators.

Frequency adaptation results for a time-varying signal are illustrated in Figure 4, left. The top plot shows the input signal *ydemo*, the middle plot the extracted frequencies, and the bottom plot the error of frequency adaptation. The figure shows results for both approaches, using a pool of oscillators (PO) and for using one oscillator and an adaptive Fourier series (AF), explained in the next section. The signal itself is of three parts, a non-stationary signal (presented by a chirp signal), followed by a step change in the frequency of the signal, and in the end a stationary signal. We can see that the output frequency stabilizes very quickly at the (changing) target frequency. In general the speed of convergence depends on the coupling strength *K* (Righetti et al., 2006). Besides the use for non-stationary signals, such as chirp signals, coping with the change in frequency of the input signal proves especially useful when adapting to the frequency of hand-generated signals, which are never stationary. In this particular example, a single adaptive frequency oscillator in a feedback loop was enough, because the input signal was purely sinusoidal.

The number of adaptive frequency oscillators in a feedback loop is therefore a matter of design. There should be enough oscillators to avoid missing the fundamental frequency and to limit the variation of frequencies described below when the input signal has many 8 Will-be-set-by-IN-TECH

*<sup>i</sup>* = *ω<sup>i</sup>* − *Ke* sin(*φi*) (9) *ω*˙ *<sup>i</sup>* = −*Ke* sin(*φi*) (10) *e* = *ydemo* − *y*ˆ (11)

*α<sup>i</sup>* cos(*φi*) (12)

*α*˙ *<sup>i</sup>* = *η* cos(*φi*)*e* (13)

*φ*˙

*y*ˆ =

the experiments we use *K* = 20 and *η* = 1, unless specified otherwise.

amplitude adaptation for each of the frequency components (Eq. (13)).

frequency oscillators.

because the input signal was purely sinusoidal.

*M* ∑ *i*=1

where K is the coupling strength, *φ<sup>i</sup>* is the phase of oscillator *i*, *e* is the input into the oscillators, *ydemo* is the input signal, *y*ˆ is the weighted sum of the oscillators' outputs, *M* is the number of oscillators, *α<sup>i</sup>* is the amplitude associated to the *i*-th oscillator, and *η* is a learning constant. In

Eq. (9) and (10) present the core of the Canonical Dynamical System – the adaptive frequency phase oscillator. Several (*M*) such oscillators are used in a feedback loop to extract separate frequency components. Eq. (11) and (12) specify the feedback loop, which needs also

As we can see in Figure 3, each of the oscillators of the structure receives the same input signal, which is the difference between the signal to be learned and the signal already learned by the feedback loop, as in Eq. (11). Since a negative feedback loop is used, this difference approaches zero as the weighted sum of separate frequency components, Eq. (12), approaches the learned signal, and therefore the frequencies of the oscillators stabilize. Eq. (13) ensures amplitude adaptation and thus the stabilization of the learned frequency. Such a feedback structure performs a kind of dynamic Fourier analysis. It can learn several frequency components of the input signal (Righetti et al., 2006) and enables the frequency of a given oscillator to converge as *t* → ∞, because once the frequency of a separate oscillator is set, it is deducted from the demonstration signal *ydemo*, and disappears from *e* (due to the negative feedback loop). Other oscillators can thus adapt to other remaining frequency components. The populating of the frequency spectrum is therefore done without any signal processing, as the whole process of frequency extraction and adaptation is totally embedded into the dynamics of the adaptive

Frequency adaptation results for a time-varying signal are illustrated in Figure 4, left. The top plot shows the input signal *ydemo*, the middle plot the extracted frequencies, and the bottom plot the error of frequency adaptation. The figure shows results for both approaches, using a pool of oscillators (PO) and for using one oscillator and an adaptive Fourier series (AF), explained in the next section. The signal itself is of three parts, a non-stationary signal (presented by a chirp signal), followed by a step change in the frequency of the signal, and in the end a stationary signal. We can see that the output frequency stabilizes very quickly at the (changing) target frequency. In general the speed of convergence depends on the coupling strength *K* (Righetti et al., 2006). Besides the use for non-stationary signals, such as chirp signals, coping with the change in frequency of the input signal proves especially useful when adapting to the frequency of hand-generated signals, which are never stationary. In this particular example, a single adaptive frequency oscillator in a feedback loop was enough,

The number of adaptive frequency oscillators in a feedback loop is therefore a matter of design. There should be enough oscillators to avoid missing the fundamental frequency and to limit the variation of frequencies described below when the input signal has many

40

Fig. 4. *Left*: Typical convergence of an adaptive frequency oscillator combined with an adaptive Fourier series (-) compared to a system with a poll of *i* oscillators (-.-). One oscillator is used in both cases. The input is a periodic signal (*y* = *sin*(*ωtt*), with *ω<sup>t</sup>* = (6*π* − *π*/5*t*) rad/s for *t* < 20 s, followed by *ω<sup>t</sup>* = 2*π* rad/s for *t* < 30 s, followed again by *ω<sup>t</sup>* = 5*π* rad/s for *t* < 45 s and finally *ω<sup>t</sup>* = 3*π* rad/s). Frequency adaptation is presented in the middle plot, starting at Ω<sup>0</sup> = *π* rad/s, where *ω<sup>t</sup>* is given by the dashed line and Ω by the solid line. The square error between the target and the extracted frequency is shown in the bottom plot. We can see that the adaptation is successful for non-stationary signals, step changes and stationary signals. *Right*: Comparison between using the PO and the AF approaches for the canonical dynamical system. The first plot shows the evolution of frequency distribution using a pool of 10 oscillators. The second plot shows the extracted frequency using the AF approach. The comparison of the target and the approximated signals is presented in the third plot. The thin solid line presents the input signal *ydemo*, the thick solid line presents the AF approach *y*ˆ and the dotted line presents the PO approach *y*ˆ*o*. The square difference between the input and the approximated signals is presented in the bottom plot.

frequencies components. A high number of oscillators can be used. Beside the almost negligible computational costs, using too many oscillators does not affect the solution. A practical problem that arises is that the oscillators' frequencies might come too close together, and then lock onto the same frequency component. To solve this we separate their initial frequencies *ω*<sup>0</sup> in a manner that suggests that (preferably only) one oscillator will go for the offset, one will go for the highest frequency, and the others will "stay between".

With a high number of oscillators, many of them want to lock to the offset (0 Hz). With the target frequency under 1 rad/s the oscillations of the estimated frequency tend to be higher, which results in longer adaptation times. This makes choosing the fundamental frequency

where *η* is the learning constant and *i* = 0...*M*. As shown in Fig. 5, the oscillator input is the difference between the input signal *ydemo* and the Fourier series *y*ˆ. Since a negative feedback loop is used, the difference approaches zero when the Fourier series representation *y*ˆ approaches the input signal *y*. Such a feedback structure performs a kind of adaptive Fourier analysis. Formally, it performs only a Fourier series approximation, because input signals may drift in frequency and phase. General convergence remains an open issue. The number of harmonic frequency components it can extract depends on how many terms of the Fourier

<sup>13</sup> Performing Periodic Tasks: On-Line Learning,

As it is able to learn different periodic signals, the new architecture of the canonical dynamical system can also be used as an imitation system by itself. Once *e* is stable (zero), the periodic signal stays encoded in the Fourier series, with an accuracy that depends on the number of elements used in Fourier series. The learning process is embedded and is done in real-time.

It is important to point out that the convergence of the frequency adaptation (i.e. the behavior of Ω) should not be confused with locking behavior (Buchli et al., 2008) (i.e. the classic phase locking behavior, or synchronization, as documented in the literature (Pikovsky et al., 2002)). The frequency adaptation process is an extension of the common oscillator with a fixed intrinsic frequency. First, the adaptation process changes the intrinsic frequency and not only the resulting frequency. Second, the adaptation has an infinite basin of attraction (see (Buchli et al., 2008)), third the frequency stays encoded in the system when the input is removed (e.g. set to zero or *e* ≈ 0). Our purpose is to show how to apply the approach for control of rhythmic robotic task. For details on analyzing interaction of multiple oscillators see e.g. (Kralemann

Augmenting the system with an output dynamical system makes it possible to synchronize the movement of the robot to a measurable periodic quantity of the desired task. Namely, the waveform and the frequency of the measured signal are encoded in the Fourier series and the desired robot trajectory is encoded in the output dynamical system. Since the adaptation of the frequency and learning of the desired trajectory can be done simultaneously, all of the system time-delays, e.g. delays in communication, sensor measurements delays, etc., are automatically included. Furthermore, when a predefined motion pattern for the trajectory is used, the phase between the input signal and output signal can be adjusted with a phase lag parameter *φ<sup>l</sup>* (see Fig. 9). This enables us to either predefine the desired motion or to teach the

Even though the canonical dynamical system by itself can reproduce the demonstration signal, using the output dynamical system allows for easier modulation in both amplitude and frequency, learning of complex patterns without extracting all frequency components and acts as a sort of a filter. Moreover, when multiple output signals are needed, only one canonical system can be used with several output systems which assure that the waveforms

The output dynamical system allows easy modulation of amplitude, frequency and center of oscillations. Once the robot is performing the learned trajectory, we can change all of these by changing just one parameter for each. The system is designed to permit on-line modulations of the originally learned trajectories. This is one of the important motivations behind the use

robot how to preform the desired rhythmic task online.

of the different degrees-of-freedom are realized appropriately.

**3. On-line learning and modulation**

of dynamical systems to encode trajectories.

**3.1 On-line modulations**

There is no need for any external optimization process or other learning algorithm.

series are used.

Adaptation and Synchronization with External Signals

et al., 2008).

without introducing complex decision-making logic difficult. Results of frequency adaptation for a complex waveform are presented in Fig. 4, where results for both PO and AF approach are presented.

Besides learning, we can also use the system to repeat already learned signals. It this case, we cut feedback to the adaptive frequency oscillators by setting *e*(*t*) = 0. This way the oscillators continue to oscillate at the frequency to which they adapted. We are only interested in the fundamental frequency, determined by

$$
\dot{\Phi} = \Omega \tag{14}
$$

$$
\dot{\Omega} = 0 \tag{15}
$$

which is derived from Eqs. (9 and 10). This is also the equation of a normal phase oscillator.

#### **2.3 Using an adaptive Fourier series**

In this section an alternative, novel architecture for the canonical dynamical system is presented. As the basis of the canonical dynamical system one single adaptive frequency phase oscillator is used. It is combined with a feedback structure based on an adaptive Fourier series (AF). The feedback structure is shown in Fig. 5. The feedback structure of an adaptive frequency phase oscillator is governed by

$$
\dot{\phi} = \Omega - \mathrm{Ke}\sin\Phi,\tag{16}
$$

$$
\dot{\Omega} = -\text{Ke}\sin\Phi\_\prime \tag{17}
$$

$$
\sigma = \mathcal{y}\_{demo} - \mathcal{y}\_{\prime} \tag{18}
$$

where K is the coupling strength, Φ is the phase of the oscillator, *e* is the input into the oscillator and *ydemo* is the input signal. If we compare Eqs. (9, 10) and Eqs. (16, 17), we can see that the basic frequency Ω and the phase Φ are in Eqs. (16, 17) clearly defined and no additional algorithm is required to determine the basic frequency. The feedback loop signal *y*ˆ in (18) is given by the Fourier series

$$\mathcal{Y} = \sum\_{i=0}^{m} (\alpha\_i \cos(i\phi) + \beta\_i \sin(i\phi)),\tag{19}$$

and not by the sum of separate frequency components as in Eq. (12). In Eq. (19) M is the number of components of the Fourier series and *αi*, *β<sup>i</sup>* are the amplitudes associated with the Fourier series governed by

$$
\dot{a}\_{\dot{i}} = \eta \cos(i\phi) e\_{\prime} \tag{20}
$$

$$
\dot{\beta}\_{\dot{l}} = \eta \sin(i\phi) e\_{\prime} \tag{21}
$$

Fig. 5. Feedback structure of an adaptive frequency oscillator combined with a dynamic Fourier series. Note that no logical algorithm is needed.

10 Will-be-set-by-IN-TECH

without introducing complex decision-making logic difficult. Results of frequency adaptation for a complex waveform are presented in Fig. 4, where results for both PO and AF approach

Besides learning, we can also use the system to repeat already learned signals. It this case, we cut feedback to the adaptive frequency oscillators by setting *e*(*t*) = 0. This way the oscillators continue to oscillate at the frequency to which they adapted. We are only interested in the

which is derived from Eqs. (9 and 10). This is also the equation of a normal phase oscillator.

In this section an alternative, novel architecture for the canonical dynamical system is presented. As the basis of the canonical dynamical system one single adaptive frequency phase oscillator is used. It is combined with a feedback structure based on an adaptive Fourier series (AF). The feedback structure is shown in Fig. 5. The feedback structure of an adaptive

where K is the coupling strength, Φ is the phase of the oscillator, *e* is the input into the oscillator and *ydemo* is the input signal. If we compare Eqs. (9, 10) and Eqs. (16, 17), we can see that the basic frequency Ω and the phase Φ are in Eqs. (16, 17) clearly defined and no additional algorithm is required to determine the basic frequency. The feedback loop signal *y*ˆ

and not by the sum of separate frequency components as in Eq. (12). In Eq. (19) M is the number of components of the Fourier series and *αi*, *β<sup>i</sup>* are the amplitudes associated with the

*<sup>e</sup> <sup>y</sup>*

Φ˙ = Ω (14) Ω˙ = 0 (15)

*<sup>φ</sup>*˙ <sup>=</sup> <sup>Ω</sup> <sup>−</sup> *Ke* sin <sup>Φ</sup>, (16) <sup>Ω</sup>˙ <sup>=</sup> <sup>−</sup>*Ke* sin <sup>Φ</sup>, (17) *e* = *ydemo* − *y*ˆ, (18)

(*α<sup>i</sup>* cos(*iφ*) + *β<sup>i</sup>* sin(*iφ*)), (19)

*α*˙ *<sup>i</sup>* = *η* cos(*iφ*)*e*, (20)

*<sup>i</sup>* = *η* sin(*iφ*)*e*, (21)

AF AF

are presented.

fundamental frequency, determined by

**2.3 Using an adaptive Fourier series**

frequency phase oscillator is governed by

in (18) is given by the Fourier series

Fourier series governed by

*y*ˆ =

*ydemo*

Fourier series. Note that no logical algorithm is needed.

*M* ∑ *i*=0

*β*˙

Fig. 5. Feedback structure of an adaptive frequency oscillator combined with a dynamic

where *η* is the learning constant and *i* = 0...*M*. As shown in Fig. 5, the oscillator input is the difference between the input signal *ydemo* and the Fourier series *y*ˆ. Since a negative feedback loop is used, the difference approaches zero when the Fourier series representation *y*ˆ approaches the input signal *y*. Such a feedback structure performs a kind of adaptive Fourier analysis. Formally, it performs only a Fourier series approximation, because input signals may drift in frequency and phase. General convergence remains an open issue. The number of harmonic frequency components it can extract depends on how many terms of the Fourier series are used.

As it is able to learn different periodic signals, the new architecture of the canonical dynamical system can also be used as an imitation system by itself. Once *e* is stable (zero), the periodic signal stays encoded in the Fourier series, with an accuracy that depends on the number of elements used in Fourier series. The learning process is embedded and is done in real-time. There is no need for any external optimization process or other learning algorithm.

It is important to point out that the convergence of the frequency adaptation (i.e. the behavior of Ω) should not be confused with locking behavior (Buchli et al., 2008) (i.e. the classic phase locking behavior, or synchronization, as documented in the literature (Pikovsky et al., 2002)). The frequency adaptation process is an extension of the common oscillator with a fixed intrinsic frequency. First, the adaptation process changes the intrinsic frequency and not only the resulting frequency. Second, the adaptation has an infinite basin of attraction (see (Buchli et al., 2008)), third the frequency stays encoded in the system when the input is removed (e.g. set to zero or *e* ≈ 0). Our purpose is to show how to apply the approach for control of rhythmic robotic task. For details on analyzing interaction of multiple oscillators see e.g. (Kralemann et al., 2008).

Augmenting the system with an output dynamical system makes it possible to synchronize the movement of the robot to a measurable periodic quantity of the desired task. Namely, the waveform and the frequency of the measured signal are encoded in the Fourier series and the desired robot trajectory is encoded in the output dynamical system. Since the adaptation of the frequency and learning of the desired trajectory can be done simultaneously, all of the system time-delays, e.g. delays in communication, sensor measurements delays, etc., are automatically included. Furthermore, when a predefined motion pattern for the trajectory is used, the phase between the input signal and output signal can be adjusted with a phase lag parameter *φ<sup>l</sup>* (see Fig. 9). This enables us to either predefine the desired motion or to teach the robot how to preform the desired rhythmic task online.

Even though the canonical dynamical system by itself can reproduce the demonstration signal, using the output dynamical system allows for easier modulation in both amplitude and frequency, learning of complex patterns without extracting all frequency components and acts as a sort of a filter. Moreover, when multiple output signals are needed, only one canonical system can be used with several output systems which assure that the waveforms of the different degrees-of-freedom are realized appropriately.
