**2. Voice-based analysis applied to the diagnosis of Parkinson's disease**

In general, PD detection based on voice analysis consists of two stages: feature extraction and classification; however, to train classifiers, two additional stages are used: feature selection and performance assessment. The main reasons to analyze voice for PD detection are: (1) voice-based analysis is a low-cost and non-invasive technique, (2) speech problems start at early stages of the disease so that voiceanalysis is appropriate for early detection, (3) we have conducted research so that the detection of PD is extended to provide clinicians with quantitative information to help in the understanding of a binary result [6]. In the following, a description

*Analysis of Voice and Magnetic Resonance Images to Assist Diagnosis of Parkinson's Disease… DOI: http://dx.doi.org/10.5772/intechopen.99973*

**Figure 1.**

*The most relevant ROI's for PD detection in 1.5 T and 3 T MRI of female patients are highlighted with colors red yellow and green.*

of voice-based PD detection is given, the advantages of conducting separate tests for men and women are highlighted. Another issue to explain is the importance of the used dataset size when conducting separate PD detection experiments. It is also explained that the most contributing features to a high detection performance are those obtained with extraction processes that resemble the way the auditory system works.

The first step in the analysis of voice recordings consists of the extraction of features. Different groups of features have been used by researchers, from which *baseline features* are the most common. Baseline features include *jitter*, *shimmer*, *detrended fluctuation analysis* (DFA) among others, and are the most traditional set of features. Other commonly used features are *Mel Frequency Cepstral Coefficients* (MFCC), *Wavelet transform*, and *Tunable-Q Wavelet Transform* (TQWT). These features are obtained by using banks of filters, which extract information over multiple frequency bands at different bandwidths so that the higher the frequency content, the higher the bandwidth. Among all the features, a reduced set of relevant features is obtained through a selection process where correlated features are eliminated. An observation, within the used dataset, is characterized by 754 features extracted from voice recordings. These features belong to six groups of features; however, in our work, there were two groups of features that were not relevant for the classification of voice recordings.

The classifiers are trained with the sets of selected relevant features. In our work, we have used four different classifiers. The classification result is binary since the subject, under analysis, is identified as a *patient with PD* or as not having PD. The different stages (feature extraction, feature selection, classification, interpretation), within the methodology, are shown in **Figure 1**.

### **3. Dataset**

Most of the previous works have conducted PD detection by using a population of patients and controls without separate studies for female and male subjects. One reason for not conducting separate studies has to do with the dataset size, which is not large enough to conduct such separate analyses. However, the work from Sakar et al. [7] has provided the research community with the largest voice-based dataset publicly available so far. This dataset was built from 756 voice recordings, where 754 voice features were extracted from each recording by using different signal processing techniques. The recordings were obtained from 107 male patients, 81

female patients, and 64 controls. This number of observations, within the dataset, is high enough to obtain statistically relevant results after partitioning it into two datasets according to gender. A total of 252 individuals were involved during the generation of this dataset. The involvement of each individual consisted in pronouncing vowel /a/ in front of a 44.1-kHz microphone three times. Each recording duration is 220 seconds (9,702,000 samples per recording). Each recording was divided into frames of 25 ms to conduct stationary signal processing for feature extraction. Feature vectors, from different frames, were averaged. Six signal processing techniques were applied for feature extraction. This dataset is found in the Machine Learning Repository of the University of California Irvine. This dataset was generated by the Cerrahpsa Faculty of Medicine at the Department of Neurology, Istanbul University, from 188 PD patients (107 men and 81 women) with an age range between 33 and 87 years old, and from 64 controls (23 men and 41 women) with an age range between 41 and 82 years old.
