II.**The Frequency-domain algorithms** [32, 33, 35, 59, 112, 66–77, 121]:

1.The Spectrum [31, 111]:

spectrum to cover more than the audible band of 20 kHz, with a dominant

the zero crossing rate (ZCR), and the time-averages of spectral parameters.

the music signals can be divided into the following approaches:

1.The ZCR algorithm [1, 34, 66–77]:

4.The Pulse Metric [31, 59, 80–82].

5.The number of silence [32, 60].

8.The Roll-Off Variance [31, 59].

**100**

I.**The Time domain algorithms**:

2.The STE [60–65, 78].

Saunders [60] proposed another two-level classifier. His approach was based on the short-time energy (STE) and the average ZCR features. In addition, Matityaho and Furst [63] have developed a neural network based model for classifying music signals. Their model was designed based on human cochlea functional performance. For audio detection, Hoyt and Wecheler [64] have developed a neural network base model using Fourier transform, Hamming filtering, and a logarithmic function as pre-processing then they applied a simple threshold algorithm for detecting audio, music, wind, traffic or any interfering sound. In addition, to improve the performance, they suggested wavelet transform feature for pre-processing. Their work is much similar to the work done by Matityaho and Furst's [63, 64]. 13 features were examined by Scheirer and Slaney [65]. Some of these features were simple modification of each other's. They also tried combining them in several multidimensional classification forms. From these previous works, the most powerful discrimination features were the STE and the ZCR. Therefore, the STE and the ZCR will be discussed thoroughly. Finally, the common classifiers of the audio and

a. The standard deviation of first order difference of the ZCR.

7.The ANN (Artificial neural networks) [12, 49, 58, 63, 79, 83–120].

c. The total number of zero crossings exceeding a specific threshold.

b. The 3rd central moment of the mean of ZCR.

3.The ZCR and the STE positive derivative [78, 79].

6.The HMM (Hidden Markov Model) [83–85].

**Table 3** summarizes the main similarity and differences between music and

The main classification approaches will be discussed in this section. They can be categorized into three different approaches: (1) time domain approaches, (2) frequency domain approaches, and (3) time-frequency domain approaches. A twolevel music and audio classifier was developed by El-Maleh [61, 62]. He used a combination of long-term features such as the variance, the differential parameters,

frequency of an average = 1.9271 kHz [25].

*Multimedia Information Retrieval*

**3. Audio and music signals classification**

audio signals.

	- a. The Cepstral Residual [122–124].
	- b. The Variance of the Cepstral Residual [122–124].
	- c. The Cepstral feature [122–124].
	- d. The Pitch [94, 107, 108, 117–119, 125, 126].
	- e. The Delta Pitch [88, 119].
