**2.3 Characteristics and differences between audio and music**

The audio signal is a slowly time varying signal in the sense that, when examined over a sufficiently short period of time "between 5 and l00 *msec*. Therefore, its characteristics are stationary within this period of time. A simple example of an audio signal is shown in **Figure 9**.

**Figure 10** is a typical example of music portion. It is very clear from the two spectrums in **Figures 9** and **10** that we can distinguish between the two types of signals.

**Figures 11** and **12** depict the evolutionary spectrum of two different types of signals, audio and music.

Now, let us discuss some of the main similarity and differences between the two types of signals.

*Classification and Separation of Audio and Music Signals DOI: http://dx.doi.org/10.5772/intechopen.94940*

**Figure 7.** *Percussion instruments.*

**Figure 8.** *Electronic organ.*

instruments are used for producing music, the tone quality measure of the fundamental frequency or harmonics is not needed. **Figure 8** shows an

The audio signal is a slowly time varying signal in the sense that, when examined

over a sufficiently short period of time "between 5 and l00 *msec*. Therefore, its characteristics are stationary within this period of time. A simple example of an

**Figure 10** is a typical example of music portion. It is very clear from the two spectrums in **Figures 9** and **10** that we can distinguish between the two types of

**Figures 11** and **12** depict the evolutionary spectrum of two different types of

Now, let us discuss some of the main similarity and differences between the two

example of organ electronic instrument.

audio signal is shown in **Figure 9**.

signals, audio and music.

types of signals.

signals.

**96**

**Figure 6.**

*Woodwind instruments.*

**Figure 5.** *Brass instruments.*

*Multimedia Information Retrieval*

**2.3 Characteristics and differences between audio and music**

#### **Figure 9.**

*An example of audio signal of specking the two-second long phrase "*Very good night*": (a) time domain (b) magnitude. (c) Phase.*

**Alternative sequence.** Audio exhibits an alternating sequence of noise-like segment while music alternates in more tonal shape. In other words, audio signal is

**Power distribution.** Normally, the power distribution of an audio signal is concentrated at frequencies lower than 4 kHz, and then collapsed rapidly above this frequency. On the other hand, there is no specific shape of the power of music

**Dominant frequency.** For a single talker, his dominant frequency can accurately be determined uniquely, however, in a single musical instrument only the average dominant frequency can be determined. In multiple musical instruments,

**Fundamental frequency.** For a single talker, his fundamental frequency can be accurately configured. However, this is not the case for a single music instrument. **Excitation patterns.** The excitation signals (pitch) for audio are usually existed only over a span of three octaves, while the fundamental music tones can span up to

**Energy sequences.** A reasonable generalization is that audio follows a pattern of high-energy conditions of voicing followed by low energy conditions, which the

**Tonal duration.** The duration of vowels in audio is very regular, following the syllabic rate. Music exhibits a wider variation in tone lengths, not being constrained

**Consonants.** Audio signal contains too many consonants while music is usually

**Zero crossing rate (**ZCR**)**. The ZCR in music is greater than that in audio. We

In the frequency domain, there is a strong overlapping between audio and music signals, so no ordinary filter can separate them. As mentioned before, audio signal may cover spectrum between 0 and 4 kHz with a dominant frequency of an average = 1.8747 kHz. However, the lowest fundamental frequency (A1) of a music signal is about 27.5 Hz and the highest frequency of the tone C8 is around 4186 Hz. The reason behind this is that musical instrument manufacturers try to bound music frequency to human's sound limits in order to achieve a strong consonant and a strong frequency overlap. Moreover, music may propagate over the audible

> • Longer sample: 600–1200 ms. • Mix of steady state (strings, winds) and transient (percussion).

• Largely harmonic and some inharmonic (percussion).

• Combined in a grammar

• Strong periodicity.

• Symbolic • Productive

by the process of articulation. Hence, tonal duration would likely be a good

distributed through its spectrum more randomly than music does.

*Classification and Separation of Audio and Music Signals*

*DOI: http://dx.doi.org/10.5772/intechopen.94940*

spectrum [59].

the case will be worst.

envelope of music is less likely to exhibit.

can use this idea to design a discriminator [60].

**Key Difference Audio Music Units of Analysis Phonemes Notes Finite**

**Spectral Structure** • Largely harmonic (vowels, voiced consonants).

> • Symbolic • Productive

*The main differences between audio and music signals.*

• Short sample (40 ms–200 ms). • More steady state than dynamic. • Timing unstrained but variable. • Amplitude modulation rate for sentences is slow ( 4 Hz)

• Tend to group in formants. • Some inharmonic stops.

• Can be combined in grammar

continuous through the time [33].

six octaves [60].

discriminator.

**Temporal Structure**

**Syntactic / Semantic Structure**

**Table 3.**

**99**

#### **Figure 10.**

*A 2-second long music signal: (a) time domain. (b) Spectrum. (c) Phase.*

**Figure 11.** *The spectrum of an average of 500 specimens: (a) audio, (b) music.*

**Figure 12.** *Evolutionary spectrum of an average of 500 specimens: (a) audio, (b) music.*

**Tonality.** By tone, we mean a single harmonic of a pure periodical sinusoid. Regardless of the type of instruments or music, the musical signal is composed of a multiple of tones; however, this is not the case in the voice signal [47, 52, 55–57].

**Bandwidth.** Normally, the audio signal has 90% of its power concentrated within frequencies lower than 4 kHz and limited to 8 kHz; however, music signal can extend its power to the upper limits of the ear's response, which is 20 kHz [52, 58].

**Alternative sequence.** Audio exhibits an alternating sequence of noise-like segment while music alternates in more tonal shape. In other words, audio signal is distributed through its spectrum more randomly than music does.

**Power distribution.** Normally, the power distribution of an audio signal is concentrated at frequencies lower than 4 kHz, and then collapsed rapidly above this frequency. On the other hand, there is no specific shape of the power of music spectrum [59].

**Dominant frequency.** For a single talker, his dominant frequency can accurately be determined uniquely, however, in a single musical instrument only the average dominant frequency can be determined. In multiple musical instruments, the case will be worst.

**Fundamental frequency.** For a single talker, his fundamental frequency can be accurately configured. However, this is not the case for a single music instrument.

**Excitation patterns.** The excitation signals (pitch) for audio are usually existed only over a span of three octaves, while the fundamental music tones can span up to six octaves [60].

**Energy sequences.** A reasonable generalization is that audio follows a pattern of high-energy conditions of voicing followed by low energy conditions, which the envelope of music is less likely to exhibit.

**Tonal duration.** The duration of vowels in audio is very regular, following the syllabic rate. Music exhibits a wider variation in tone lengths, not being constrained by the process of articulation. Hence, tonal duration would likely be a good discriminator.

**Consonants.** Audio signal contains too many consonants while music is usually continuous through the time [33].

**Zero crossing rate (**ZCR**)**. The ZCR in music is greater than that in audio. We can use this idea to design a discriminator [60].

In the frequency domain, there is a strong overlapping between audio and music signals, so no ordinary filter can separate them. As mentioned before, audio signal may cover spectrum between 0 and 4 kHz with a dominant frequency of an average = 1.8747 kHz. However, the lowest fundamental frequency (A1) of a music signal is about 27.5 Hz and the highest frequency of the tone C8 is around 4186 Hz. The reason behind this is that musical instrument manufacturers try to bound music frequency to human's sound limits in order to achieve a strong consonant and a strong frequency overlap. Moreover, music may propagate over the audible


**Table 3.**

*The main differences between audio and music signals.*

**Tonality.** By tone, we mean a single harmonic of a pure periodical sinusoid. Regardless of the type of instruments or music, the musical signal is composed of a multiple of tones; however, this is not the case in the voice signal [47, 52, 55–57]. **Bandwidth.** Normally, the audio signal has 90% of its power concentrated within frequencies lower than 4 kHz and limited to 8 kHz; however, music signal can extend

its power to the upper limits of the ear's response, which is 20 kHz [52, 58].

*Evolutionary spectrum of an average of 500 specimens: (a) audio, (b) music.*

**Figure 11.**

**Figure 10.**

*Multimedia Information Retrieval*

**Figure 12.**

**98**

*The spectrum of an average of 500 specimens: (a) audio, (b) music.*

*A 2-second long music signal: (a) time domain. (b) Spectrum. (c) Phase.*

spectrum to cover more than the audible band of 20 kHz, with a dominant frequency of an average = 1.9271 kHz [25].

**Table 3** summarizes the main similarity and differences between music and audio signals.

II.**The Frequency-domain algorithms** [32, 33, 35, 59, 112, 66–77, 121]:

c. The Spectral Centroid Mean and Variance.

d. The Spectral Flux Mean and Variance.

1.The Spectrum [31, 111]:

*DOI: http://dx.doi.org/10.5772/intechopen.94940*

*Classification and Separation of Audio and Music Signals*

a. The Spectral Centroid.

e. The Spectrum Roll-Off.

f. The Signal Bandwidth.

h. The Delta Amplitude.

2.The Cepstrum [122]:

g. The Spectrum Amplitude.

a. The Cepstral Residual [122–124].

c. The Cepstral feature [122–124].

e. The Delta Pitch [88, 119].

[34]. We may define the ZCR as in the following equation.

X *N*

*m*¼*n*�*N*þ1

*Zn* <sup>¼</sup> <sup>1</sup> 2*N*

III.**The Time-Frequency domain algorithms**:

[81, 128, 129].

**3.1 Time domain algorithms**

*3.1.1 The ZCR algorithm*

**101**

b. The Variance of the Cepstral Residual [122–124].

d. The Pitch [94, 107, 108, 117–119, 125, 126].

1.The Spectrogram (or Sonogram) [13, 19, 86, 127].

2.The Evolutionary Spectrum and the Evolutionary Bispectrum

The ZCR algorithm can be defined as the number of crossing the signal the zero axis within a specific window. It is widely used because its simplicity and robustness

where *Zn* is the ZCR, *N* is the number of samples in one window, and *sgn* is the sign of the signal such that *sgn* [*x*(*n*)] = 1 when *x*(*n*) > 0, *sgn* [*x*(*n*)] = �1,

∣ *sgn x m* ½ �� ð Þ *sgn x m* ½ � ð Þ � 1 ∣ (1)

b. The Spectral Flux Variance.
