**3.4 Table wiping**

14 Will-be-set-by-IN-TECH

We applied the system for two-handed drumming on a full-sized humanoid robot called CB-i, shown in Fig. 8. The robot learns the waveform and the frequency of the demonstrated movement on-line, and continues drumming with the extracted frequency after the demonstration. The system allows the robot to synchronize to the extracted frequency of

CB-i robot is a 51 DOF full-sized humanoid robot, developed by Sarcos. For the task of drumming we used 8 DOF in the upper arms of the robot, 4 per arm. Fig. 8 shows the CB-i

The control scheme was implemented in Matlab/Simulik and is presented in Fig. 9. The imitation system provides the desired task space trajectory for the robot's arms. The waveform was defined in advance. Since the sound signal consists usually of several different tones, e.g. drums, guitar, singer, noise etc., it was necessary to pre-process the signal in order to get the periodic signal which represents the drumming. The input signal was modified into short pulses. This pre-processing only modifies the waveform and does not determine the

> CDS ODS *wi r*

Fig. 10 shows the results of frequency adaptation to music. The waveforms for both hands were predefined. The frequency of the imitated motion quickly adapted to the drumming tones. The drumming sounds are presented with a real-time power spectrum. With this particular experiment we show the possibility of using our proposed system for synchronizing robot's motion with an arbitrary periodic signal, e.g. music. Due to the complexity of the audio signal this experiment does require some modification of the measured (audio) signal, but is not pre-processed in the sense of determining the frequency. The drumming experiment shows that the proposed two-layered system is able to synchronize the motion of the robot to

*xl,r*

AF AF l

Fig. 9. Proposed two-layered structure of the control system for synchronizing robotic

Fig. 8. Two-handed drumming using the Sarcos CB-i humanoid robot.

*ydemo*

Music

the drumming tones of the music.

**3.3 Two-handed drumming**

music, and thus drum-along in real-time.

robot in the experimental setup.

frequency and the phase.

drumming to the music.

In this section we show how we can use the proposed two-layered system to modify already learned movement trajectories according to the measured force. ARMAR-IIIb humanoid robot, which is kinematically equal to the ARMAR-IIIa (Asfour et al., 2006) was used in the experiment.

From the kinematics point of view, the robot consists of seven subsystems: head, left arm, right arm, left hand, right hand, torso, and a mobile platform. The head has seven DOF and is equipped with two eyes, which have a common tilt and can pan independently. Each arm has 7 DOF and each hand additional 8DOF. The locomotion of the robot is realized using a wheel-based holonomic platform.

In order to obtain reliable motion data of a human wiping demonstration through observation by the robot, we exploited the color features of the sponge to track its motion. Using the stereo camera setup of the robot, the implemented blob tracking algorithm based on color segmentation and a particle filter framework provides a robust location estimation of the sponge in 3D. The resulting trajectories were captured with a frame rate of 30 Hz.

For learning of movements we first define the area of demonstration by measuring the lower-left and the upper-right position within a given time-frame, as is presented in Fig. 11. All tracked sponge-movement is then normalized and given as offset to the central position of this area.

For measuring the contact forces between the object in the hand and the surface of the plane a 6D-force/torque sensor is used, which is mounted at the wrist of the robot.

### **3.5 Adaptation of the learned trajectory using force feedback**

Learning of a movement that brings the robot into contact with the environment must be based on force control, otherwise there can be damage to the robot or the object to which the robot applies its force. In the task of wiping a table, or any other object of arbitrary shape,

contact when an object is encountered. Another simplification is to use the length of the force

<sup>19</sup> Performing Periodic Tasks: On-Line Learning,

upwards every time it hits something, for example the side of a sink. No contact should be

The learning of the force profile is done by modifying the weighs *wi* for the selected degree of freedom *yj* in every time-step by incremental locally weighted regression (Atkeson et al.,

The **KF** matrix controls the behavior of the movement. The correcting movement has to be fast enough to move away from the object if the robot hand encounters sufficient force, and at the same time not too fast so that it does not produce instabilities due to the discrete-time sampling when in contact with an object. A dead-zone of response has to be included, for example |*F*| < 1 N, to take into account the noise. We empirically set *kF* = 20, and limited the

Feedback from a force-torque sensor is often noisy due to the sensor itself and mainly due to vibrations of the robot. A noisy signal is not the best solution for the learning algorithm because we also need time-discrete first and second derivatives. The described active compliance algorithm uses the position of the end-effector as input, which is the integrated

Having adapted the trajectory to the new surface enables very fast movement with a constant force profile at the contact of the robot/sponge and the object, without any time-sampling and instability problems that may arise when using compliance control only. Furthermore, we can still use the compliant control once we have learned the shape of the object. Active compliance, combined with a passive compliance of a sponge, and the modulation and perturbation properties of DMPs, such as slow-down feedback, allow fast and safe execution

Our kitchen scenario includes the ARMAR-IIIb humanoid robot wiping a kitchen table. First the robot attempts to learn wiping movement from human demonstration. During the demonstration of the desired wiping movement the robot tracks the movement of the sponge in the demonstrator's hand with his eyes. The robot only reads the coordinates of the movement in a horizontal plane, and learns the frequency and waveform of the movement. The waveform can be arbitrary, but for wiping it can be simple circular or one-dimensional left-right movement. The learned movement is encoded in the task space of the robot, and an inverse kinematics algorithm controls the movement of separate joints of the 7-DOF arm. The robot starts mimicking the movement already during the demonstration, so the demonstrator can stop learning once he/she is satisfied with the learned movement. Once the basic learning of periodic movement is stopped, we use force-feedback to modify the learned trajectory. The term *F* − *F*<sup>0</sup> in (28) provides velocity in the direction of −*z* axis, and the hand holding the sponge moves towards the kitchen table or any other surface under the arm. As the hand makes contact with the surface of an object, the vertical velocity adapts. The force profile is learned in a few periods of the movement. The operator can afterwards stop force profile

Fig. 12 on the left shows the results of learning the force-profile for a flat surface. As the robot grasps the sponge, its orientation and location are unknown to the robot, and the tool center point (TCP) changes. Should the robot simply perform a planar trajectory it would not ensure constant contact with the table. As we can see from the results, the hand initially

desired velocity and therefore has no difficulties with the noisy measured signal.

of periodic movement while maintaining a sliding contact with the environment.

learning and execute the adjusted trajectory at an arbitrary frequency.

made from above, as this will make the robot press up harder and harder.

force feedback to allow maximum linear velocity of 120 mm/s.

*<sup>z</sup>* for the feedback instead of *Fz* in (28). This way the robot can move

vector *F* =

 *F*2 *<sup>x</sup>* + *F*<sup>2</sup>

1997), see also Section 2.1.

**3.5.1 The learning scenario**

*<sup>y</sup>* + *F*<sup>2</sup>

Adaptation and Synchronization with External Signals

Fig. 11. Area for movement demonstration is determined by measuring the bottom-left most and the top-right most positions within a given time frame. These coordinates make a rectangular area (marked with dashed lines) where the robot tracks the demonstrated movements.

constant contact with the object is required. To teach the robot the necessary movement, we decoupled the learning of the movement from the learning of the shape of the object. We first apply the described two-layered movement imitation system to learn the desired trajectories by means of visual feedback. We then use force-feedback to adapt the motion to the shape of the object that the robot acts upon.

Periodic movements can be of any shape, yet wiping can be effective with simple one dimensional left-right movement, or circular movement. Once we are satisfied with the learned movement, we can reduce the frequency of the movement by modifying the Ω value. The low frequency of movement and consequentially low movement speed reduce the possibility of any damage to the robot. When performing playback we modify the learned movement with an active compliance algorithm. The algorithm is based on the velocity-resolved approach (Villani & J., 2008). The end-effector velocity is calculated by

$$\mathbf{v\_{r}} = \mathbf{S\_{V}}\mathbf{v\_{V}} + \mathbf{K\_{F}S\_{F}}(\mathbf{F\_{m}} - \mathbf{F\_{0}}).\tag{26}$$

Here **vr** stands for the resolved velocities vector, **Sv** for the velocity selection matrix, **vv** for the desired velocities vector, **KF** for the force gain matrix, **SF** for the force selection matrix, and **Fm** for the measured force. **F0** denotes the force offset which determines the behavior of the robot when not in contact with the environment. To get the desired positions we use

$$\mathbf{Y} = \mathbf{Y}\_{\mathbf{r}} + \mathbf{S}\_{\mathbf{F}} \int \mathbf{v}\_{\mathbf{r}} dt. \tag{27}$$

Here **Yr** is the desired initial position and **Y** = (*yj*), *j* = 1, ..., 6 is the actual position/orientation. Using this approach we can modify the trajectory of the learned periodic movement as described below.

Equations (26 – 27) become simpler for the specific case of wiping a flat surface. By using a null matrix for **Sv**, **KF** = *diag*(0, 0, *kF*, 0, 0, 0), **SF** = *diag*(0, 0, 1, 0, 0, 0), the desired end-effector height *z* in each discrete time step Δ*t* becomes

$$
\dot{z}(t) = k\_F (F\_z(t) - F\_0)\_\prime \tag{28}
$$

$$z(t) = z\_0 + \dot{z}(t)\Delta t.\tag{29}$$

Here *z*<sup>0</sup> is the starting height, *kF* is the force gain (of units kg/s), *Fz* is the measured force in the *z* direction and *F*<sup>0</sup> is the force with which we want the robot to press on the object. Such formulation of the movement ensures constant movement in the −*z* direction, or constant 16 Will-be-set-by-IN-TECH

Fig. 11. Area for movement demonstration is determined by measuring the bottom-left most and the top-right most positions within a given time frame. These coordinates make a rectangular area (marked with dashed lines) where the robot tracks the demonstrated

constant contact with the object is required. To teach the robot the necessary movement, we decoupled the learning of the movement from the learning of the shape of the object. We first apply the described two-layered movement imitation system to learn the desired trajectories by means of visual feedback. We then use force-feedback to adapt the motion to the shape of

Periodic movements can be of any shape, yet wiping can be effective with simple one dimensional left-right movement, or circular movement. Once we are satisfied with the learned movement, we can reduce the frequency of the movement by modifying the Ω value. The low frequency of movement and consequentially low movement speed reduce the possibility of any damage to the robot. When performing playback we modify the learned movement with an active compliance algorithm. The algorithm is based on the velocity-resolved approach (Villani & J., 2008). The end-effector velocity is calculated by

Here **vr** stands for the resolved velocities vector, **Sv** for the velocity selection matrix, **vv** for the desired velocities vector, **KF** for the force gain matrix, **SF** for the force selection matrix, and **Fm** for the measured force. **F0** denotes the force offset which determines the behavior of the

Here **Yr** is the desired initial position and **Y** = (*yj*), *j* = 1, ..., 6 is the actual position/orientation. Using this approach we can modify the trajectory of the learned periodic

Equations (26 – 27) become simpler for the specific case of wiping a flat surface. By using a null matrix for **Sv**, **KF** = *diag*(0, 0, *kF*, 0, 0, 0), **SF** = *diag*(0, 0, 1, 0, 0, 0), the desired end-effector

Here *z*<sup>0</sup> is the starting height, *kF* is the force gain (of units kg/s), *Fz* is the measured force in the *z* direction and *F*<sup>0</sup> is the force with which we want the robot to press on the object. Such formulation of the movement ensures constant movement in the −*z* direction, or constant

robot when not in contact with the environment. To get the desired positions we use

**Y** = **Yr** + **SF**

**vr** = **Svvv** + **KFSF**(**Fm** − **F0**). (26)

*z*˙(*t*) = *kF*(*Fz*(*t*) − *F*0), (28) *z*(*t*) = *z*<sup>0</sup> + *z*˙(*t*)Δ*t*. (29)

**vr***dt*. (27)

movements.

the object that the robot acts upon.

movement as described below.

height *z* in each discrete time step Δ*t* becomes

contact when an object is encountered. Another simplification is to use the length of the force vector *F* = *F*2 *<sup>x</sup>* + *F*<sup>2</sup> *<sup>y</sup>* + *F*<sup>2</sup> *<sup>z</sup>* for the feedback instead of *Fz* in (28). This way the robot can move upwards every time it hits something, for example the side of a sink. No contact should be made from above, as this will make the robot press up harder and harder.

The learning of the force profile is done by modifying the weighs *wi* for the selected degree of freedom *yj* in every time-step by incremental locally weighted regression (Atkeson et al., 1997), see also Section 2.1.

The **KF** matrix controls the behavior of the movement. The correcting movement has to be fast enough to move away from the object if the robot hand encounters sufficient force, and at the same time not too fast so that it does not produce instabilities due to the discrete-time sampling when in contact with an object. A dead-zone of response has to be included, for example |*F*| < 1 N, to take into account the noise. We empirically set *kF* = 20, and limited the force feedback to allow maximum linear velocity of 120 mm/s.

Feedback from a force-torque sensor is often noisy due to the sensor itself and mainly due to vibrations of the robot. A noisy signal is not the best solution for the learning algorithm because we also need time-discrete first and second derivatives. The described active compliance algorithm uses the position of the end-effector as input, which is the integrated desired velocity and therefore has no difficulties with the noisy measured signal.

Having adapted the trajectory to the new surface enables very fast movement with a constant force profile at the contact of the robot/sponge and the object, without any time-sampling and instability problems that may arise when using compliance control only. Furthermore, we can still use the compliant control once we have learned the shape of the object. Active compliance, combined with a passive compliance of a sponge, and the modulation and perturbation properties of DMPs, such as slow-down feedback, allow fast and safe execution of periodic movement while maintaining a sliding contact with the environment.

#### **3.5.1 The learning scenario**

Our kitchen scenario includes the ARMAR-IIIb humanoid robot wiping a kitchen table. First the robot attempts to learn wiping movement from human demonstration. During the demonstration of the desired wiping movement the robot tracks the movement of the sponge in the demonstrator's hand with his eyes. The robot only reads the coordinates of the movement in a horizontal plane, and learns the frequency and waveform of the movement. The waveform can be arbitrary, but for wiping it can be simple circular or one-dimensional left-right movement. The learned movement is encoded in the task space of the robot, and an inverse kinematics algorithm controls the movement of separate joints of the 7-DOF arm. The robot starts mimicking the movement already during the demonstration, so the demonstrator can stop learning once he/she is satisfied with the learned movement. Once the basic learning of periodic movement is stopped, we use force-feedback to modify the learned trajectory. The term *F* − *F*<sup>0</sup> in (28) provides velocity in the direction of −*z* axis, and the hand holding the sponge moves towards the kitchen table or any other surface under the arm. As the hand makes contact with the surface of an object, the vertical velocity adapts. The force profile is learned in a few periods of the movement. The operator can afterwards stop force profile learning and execute the adjusted trajectory at an arbitrary frequency.

Fig. 12 on the left shows the results of learning the force-profile for a flat surface. As the robot grasps the sponge, its orientation and location are unknown to the robot, and the tool center point (TCP) changes. Should the robot simply perform a planar trajectory it would not ensure constant contact with the table. As we can see from the results, the hand initially

Fig. 14. Experimental setup for cooperative human-robot rope turning.

photos showing the adaptation to the flat and bowl-shaped surfaces.

can synchronize to an EMG signal, which is inherently very noisy.

inverse kinematics. Figure 14 shows the experimental setup.

**4. Synchronization with external signals**

Adaptation and Synchronization with External Signals

be observed.

**4.1 Robotic rope turning**

bottom plot shows the measured length of the force vector |*F*|. As we can see the force vector keeps the same force profile, even though the frequency is increased. No increase in the force profile proves that the robot has learned the required trajectory. Fig. 13 shows a sequence of

<sup>21</sup> Performing Periodic Tasks: On-Line Learning,

Fig. 12 on the right shows the results for a bowl-shaped object. As we can see from the results the height of the movement changes for more than 6 cm within a period. The learned shape (after the vertical dashed lined) maintains the shape of a bowl, but has an added local minimum. This is the result of the dead-zone within the active compliance, which comes into effect when going up one side, and down the other side of the bowl. No significant change in the force profile can be observed in the bottom plot after a manual increase in frequency. Some drift, as the consequence of an error of the sensor and of wrist control on the robot, can

Once the movement is learned we can change its frequency. The new frequency can be determined from an external signal using the canonical dynamical system. This allows easy synchronization to external measured signals, such as drumming, already presented in Section 3.3. In this section we show how we applied the system to a rope turning task, which is task that requires continuous cooperation of a human and a robot. We also show how we

We performed the rope-turning experiment on a Mitsubishi PA-10 robot with a JR-3 force/torque sensor attached to the top of the robot to measure the torques and forces in the string. Additionally, an optical system (Optotrak Certus) was used for validation, i.e. for measuring the motion of a human hand. The two-layered control system was implemented in Matlab/Simulink. The imitation system provides a pre-defined desired circular trajectory for the robot. The motion of a robot is constrained to up-down and left-right motion using

Determining the frequency is done using the canonical dynamical system. Fig. 15 left shows the results of frequency extraction (top plot) from the measured torque signal (second plot). The frequency of the imitated motion quickly adapts to the measured periodic signal. When the rotation of the rope is stable, the human stops swinging the rope and maintains the hand in a fixed position. The movement of the human hand is shown in the third plot. In the last plot we show the movement of the robot. By comparing the last two plots in Fig. 15 left, we can see that after 3 s the energy transition to the rope is done only by the motion of the robot.

Fig. 12. Results of learning the force profile on a flat surface on the left and on a bowl-shaped surface on the right. For the flat surface we can see that the height of the movement changes for approx. 5 cm during one period to maintain contact with the surface. The values were attained trough robot kinematics. For the bowl shaped surface we can see that the trajectory assumes a bowl-shape with an additional change of direction, which is the result of the compliance of the wiping sponge and the zero-velocity dead-zone. A dashed vertical line marks the end of learning of the force profile. Increase in frequency can be observed in the end of the plot. The increase was added manually.

Fig. 13. A sequence of still photos showing the adaptation of wiping movement via force-feedback to a flat surface, as of a kitchen table, in the top row, and adaptation to a bowl-shaped surface in the bottom row.

moves down until it makes contact with the surface. The force profile later changes the desired height by approx. 5 cm within one period. After the learning (stopped manually, marked with a vertical dashed line) the robot maintains such a profile. A manual increase in frequency was introduced to demonstrate the ability to perform the task at an arbitrary frequency. The 18 Will-be-set-by-IN-TECH

−350 −300 −250


Fig. 12. Results of learning the force profile on a flat surface on the left and on a bowl-shaped surface on the right. For the flat surface we can see that the height of the movement changes for approx. 5 cm during one period to maintain contact with the surface. The values were attained trough robot kinematics. For the bowl shaped surface we can see that the trajectory assumes a bowl-shape with an additional change of direction, which is the result of the compliance of the wiping sponge and the zero-velocity dead-zone. A dashed vertical line marks the end of learning of the force profile. Increase in frequency can be observed in the

Fig. 13. A sequence of still photos showing the adaptation of wiping movement via force-feedback to a flat surface, as of a kitchen table, in the top row, and adaptation to a

moves down until it makes contact with the surface. The force profile later changes the desired height by approx. 5 cm within one period. After the learning (stopped manually, marked with a vertical dashed line) the robot maintains such a profile. A manual increase in frequency was introduced to demonstrate the ability to perform the task at an arbitrary frequency. The

z [mm]

<sup>0</sup> <sup>20</sup> <sup>40</sup> <sup>60</sup> <sup>80</sup> <sup>100</sup> −400

<sup>0</sup> <sup>20</sup> <sup>40</sup> <sup>60</sup> <sup>80</sup> <sup>100</sup> −2

t [s]

<sup>0</sup> <sup>5</sup> <sup>10</sup> <sup>15</sup> <sup>20</sup> <sup>25</sup> <sup>30</sup> <sup>35</sup> <sup>40</sup> −400

<sup>0</sup> <sup>5</sup> <sup>10</sup> <sup>15</sup> <sup>20</sup> <sup>25</sup> <sup>30</sup> <sup>35</sup> <sup>40</sup> −2

t [s]

end of the plot. The increase was added manually.

bowl-shaped surface in the bottom row.

−350 −300 −250


z [mm]

Fig. 14. Experimental setup for cooperative human-robot rope turning.

bottom plot shows the measured length of the force vector |*F*|. As we can see the force vector keeps the same force profile, even though the frequency is increased. No increase in the force profile proves that the robot has learned the required trajectory. Fig. 13 shows a sequence of photos showing the adaptation to the flat and bowl-shaped surfaces.

Fig. 12 on the right shows the results for a bowl-shaped object. As we can see from the results the height of the movement changes for more than 6 cm within a period. The learned shape (after the vertical dashed lined) maintains the shape of a bowl, but has an added local minimum. This is the result of the dead-zone within the active compliance, which comes into effect when going up one side, and down the other side of the bowl. No significant change in the force profile can be observed in the bottom plot after a manual increase in frequency. Some drift, as the consequence of an error of the sensor and of wrist control on the robot, can be observed.
