**Abstract**

With spoken language interfaces, chatbots, and enablers, the conversational intelligence became an emerging field of research in man-machine interfaces in several target domains. In this paper, we introduce the multilingual conversational chatbot platform that integrates Open Health Connect platform and mHealth application together with multimodal services in order to deliver advanced 3D embodied conversational agents. The platform enables novel human-machine interaction with the cancer survivors in six different languages. The platform also integrates patients' reported information as patients gather health data into digital clinical records. Further, the conversational agents have the potential to play a significant role in healthcare, from assistants during clinical consultations, to supporting positive behavior changes, or as assistants in living environments helping with daily tasks and activities.

**Keywords:** embodied conversational agents, multimodal sensing, artificial intelligence, spoken language interfaces, cancer survivors

### **1. Introduction**

An important type of patient-gathered health data (PGHD) represents so-called patient-reported outcomes (PROs). They are in general collected from patients in order to help address a health concern [1] and represent self-reports from everyday life. Therefore, in healthcare, they are also important data sources [2]. Further, PROs have become a complementary data source to telemonitoring [3], data mining, and imaging-based AI techniques [4–8]. Nowadays, the knowledge domains of clinical specialties are expanding rapidly, while due to the sheer volume and complexity of data, clinicians often fail to really exploit its potential [9]. Firstly, patient outcomes were collected mostly face to face, using paper-written forms [10–12]. Forms were added to paper-form health records (HRs), and only after the advances of information and communication technologies (ICT), the HRs are slowly being digitalized. Several studies already showed the efficiency of electronic questionnaire apps on, e.g., smartphones [13, 14]. Thus, electronic PROs, supported by artificial intelligence techniques, can further improve dropout and acceptance-rates. Further, they are also able to improve clinical and patient "satisfaction" [15–17]. A perfect example of how patient gathered health data (PGHD) and PROs are able to improve quality

of life (QoL) is, e.g., ambient assisted living (AAL). Namely, AAL environments already exploit mobile devices, smart home products, software applications, and other wearable devices in the individual's everyday environment [17, 18].

Significant advances in speech and natural language processing (NLP) technologies already offer more personalized and human-like interaction, i.e., symmetric multimodality. Therefore, several spoken language interfaces, chatbots, and enablers, and the conversational intelligence became an emerging field of research in manmachine interfaces based on artificial intelligence techniques. Thus, embodied conversational agents (ECAs) can play an important role in healthcare, e.g., assistants in AAL environment in order to help with activities and daily tasks, or assistants during clinical consultations, in order to support positive behavior changes [19, 20]. These advanced interactive systems may certainly have a major impact on long-term sustainable quality of results and patient adherence over time.

The main challenges represent interoperability, integration of PGHD data, and lack of standardization [21, 22]. Namely, in healthcare, the integration of PGHD data in clinical decision-making still presents a big problem. Further, in the interoperability of electronic health records (EHRs), the unified representation of electronic health records (EHRs) still represents an issue. In order to get the highest contribution from PROs and PGHD, we considered the following: (i) "how to integrate data into clinical workflow?", (ii) "the cost and time for collecting PROs?", (iii) "how to efficiently collect data from patients?", and (iv) "how to enable proper interpretation by the clinicians?"

Within a Horizon 2020 project (PERSIST, https://projectpersist.com/, last accessed 19 June 2021), therefore, we propose a holistic system for collecting PROs remotely via both multilingual chatbots and ECAs. Further, the integration of PROs into the clinical workflow by using FHIR has been proposed. The FHIR server is located at the Open Health Connect (OHC) platform, and all traffic is orchestrated by a so-called multimodal sensing network (MSN) that runs several microservices, such as PLATTOS text-to-speech (TTS) system, ECA, RASA-based chatbot system, and SPREAD automatic speech recognition (ASR) system. In this way, we offer a fully symmetric model of interaction supporting speech, gesture, and facial expression on input and output. Further, the FHIR methodology is delivered as an enabler for efficient integration and a fully functional FHIR server [23].

The paper is structured as follows: in Section 2, related works and the ideas of our study will be presented. The PERSIST platform is described in Section 3, and fully symmetric ECA-based interaction model in Section 4. The results are presented in Section 5. In Section 6, the contributions of the PERSIST system are discussed, and the paper ends with the conclusions.
