Section 2 Sound Processing

**26**

2016;**3**(1):4

*Human 4.0 - From Biology to Cybernetic*

[1] Marr B. The 7 Biggest Technology Trends In 2020 Everyone Must Get Ready For Now [Internet]. 2019. Available from: https://www.forbes. com/sites/bernardmarr/2019/09/30/ the-7-biggest-technology-trendsin-2020-everyone-must-get-readyfor-now/#25b725042261 [Accessed:

[10] Billinghurst M. Augmented reality in education. New Horizons for

[11] Lee K. Augmented reality in education and training. TechTrends.

[12] Yilmaz RM. Augmented reality trends in education between 2016 and 2017 years. State of the Art Virtual Reality and Augmented Reality

[13] Altinpulluk H, Kesim M. The classification of augmented reality books: A literature review. In: Proceedings of the INTED. 2016.

[14] Dünser A, Walker L, Horner H, Bentall D. Creating interactive physics education books with augmented reality. In: Proceedings of the 24th Australian Computer-Human Interaction Conference. ACM; 2012. pp. 107-114

[15] Lim C, Park T. Exploring the educational use of an augmented reality books. In: Proceedings of the Annual Convention of the Association for Educational Communications and

Technology. 2011. pp. 172-182

[16] Grubert J, Langlotz T, Grasset R. Augmented reality browser survey. Technical report. Austria: Institute for Computer Graphics and Vision, Graz University of Technology; 2011

[17] Ogden H. An Introduction to AR Browsers [Internet]. 2015. Available from: https://ercim-news.ercim.eu/ en103/special/an-introduction-to-arbrowsers [Accessed: 04 December 2019]

Learning. 2002;**12**(5):1-5

2012;**56**(2):13-21

Knowhow. 2018;**81**:97

pp. 4110-4118

[2] Wikipedia [Internet]. Available from: https://www.wikipedia.org/ [Accessed:

[3] Azuma RT. A survey of augmented

[4] Milgram P, Kishino F. A taxonomy of mixed reality visual displays. IEICE Transactions on Information and Systems. 1994;**77**(12):1321-1329

[5] Glockner H, Jannek K, Mahn J, Theis B. Augmented Reality in Logistics. Troisdorf, Germany: DHL Customer

[6] Kesim M, Ozarslan Y. Augmented

Solutions & Innovation; 2014

reality in education: Current technologies and the potential for education. Procedia-Social and Behavioral Sciences. 2012;**47**:297-302

[7] Martín-Gutiérrez J, Mora CE, Añorbe-Díaz B, González-Marrero A.

Virtual technologies trends in education. EURASIA Journal of Mathematics, Science and Technology

Education. 2017;**13**(2):469-486

[8] Hwang GJ. Definition, framework and research issues of smart learning environments-a context-aware

ubiquitous learning perspective. Smart Learning Environments. 2014;**1**(1):4

[9] Zhu ZT, Yu MH, Riezebos P. A research framework of smart education.

Smart Learning Environments.

reality. Presence Teleoperators and Virtual Environments.

04 December 2019]

**References**

04 December 2019]

1997;**6**(4):355-385

**Chapter 3**

**Abstract**

**1. Introduction**

series of features.

**29**

The Theory behind Controllable

A Cross-Disciplinary Approach

As part of the Human-Computer Interaction field, Expressive speech synthesis is

a very rich domain as it requires knowledge in areas such as machine learning, signal processing, sociology, and psychology. In this chapter, we will focus mostly on the technical side. From the recording of expressive speech to its modeling, the reader will have an overview of the main paradigms used in this field, through some of the most prominent systems and methods. We explain how speech can be represented and encoded with audio features. We present a history of the main methods of Text-to-Speech synthesis: concatenative, parametric and statistical parametric speech synthesis. Finally, we focus on the last one, with the last techniques modeling Text-to-Speech synthesis as a sequence-to-sequence problem. This enables the use of Deep Learning blocks such as Convolutional and Recurrent Neural Networks as well as Attention Mechanism. The last part of the chapter intends to assemble the different aspects of the theory and summarize the concepts.

**Keywords:** deep learning, speech synthesis, TTS, expressive speech, emotion

speech from a text with control on prosodic features.

particularly at the intersection of three disciplines:

communication between humans.

Controllable Expressive Speech Synthesis is the task of generating expressive

This task is positioned in the emerging field of affective computing and more

• Expressive speech analysis (Section 2), which provides mathematical tools to extract useful characteristics from speech depending on the task to perform. Speech is seen as a signal, such as images, text, videos, or any kind of

information coming from any source. As such, it can be characterized by a time

• Expressive speech modeling (Section 3), modeling human emotions and their impact on the speech signal. Speech is considered here as a means of

• Expressive speech synthesis (Section 4), for which machine learning tools have become ubiquitous, especially hidden Markov models (HMMs) and more

Expressive Speech Synthesis:

*Noé Tits, Kevin El Haddad and Thierry Dutoit*
