**4. Scenario implementation**

We will briefly describe possible implementations of TherAImin. This implementation is presented to show that the proposed framework provides a usable foundation for building AI-assisted devices and describing them in a systematic manner.

We think that even though we do not present a device for each of the possible categories, the difference between the selected AIMEs is wide enough to show that, in principle, any AIME can be implemented using the framework.

#### **4.1 TherAImin**

As discussed in Section 1.2, the Theremin is an instrument with two antennas that is controlled by the player without touch interaction. The block diagram of the Theremin is shown in **Figure 5**. The TherAIMin is an AI-assisted variation of the original instrument, where hand gestures are used to control the timbre.

Although we could have implemented TherAImin without antennas using, e.g., Mediapipe Handpose [29], we have decided to be more faithful to the original instrument and thus use the [30], which provides a versatile Theremin implementation with Pitch and Volume outputs.

Thus, openTheremin antennas act as part of the user-stimulus capture layer. The other part of this layer is a camera that is used to capture the user's hand gesture.

We will interface openTheramin using a Raspberry Pi board with an RPI-GP90 pulse signal IO hat. This is part of the stimulus adaptation layer. The other part of this layer is made up of the video interface already available in the raspberry pi.

The embedded learning layer is built using Google's Teachable Machine [31] accelerated with a Coral Edge TPU accelerator. The approach is very similar to [32] where a machine that can be trained is used to recognize objects. With this approach, the accelerated embedded system classifies the gestures in the number of trained classes. It is important to keep the gesture classes different, and it is also essential to train a wide class of gestures and other images that the camera may see in the background class [33]. An advantage of the TherAImin is that when the AI system makes a wrong decision, this will affect the timbre and the effects, but not the volume and the pitch.

The sound production layer is implemented on raspberry pi using sonic PI [34]. The selection of sound pitch and volume is done by a small Processing program that produces OSC [35]. Open Sound Control (OSC) is a protocol to connect sound synthesizers, computers, and other multimedia devices for purposes such as musical performance or show control. Many music-related software tools, including sonic PI, support the OSC protocol. The OSC protocol uses UDP (or TCP) packets and thus can run either in a single embedded system or be distributed over a network.
