Abstract

Most of the works for music visualization are for composition and development of arrangements, but very few are designed for data analysis. This chapter proposes a generalized model based on numeric analysis of compositions, capable to project high dimensionality in a 3D space to represent the pieces of music. A multiresolution extraction with vectors and Euclidean distance are used to analyze the compositions to visualize its main features. The relationships between the measures of the compositions are projected. This also provides the ability to use the resulting distances as an objective function to measure a performance of the music composition system by comparing the extracted features as a profile of the music piece from the original batch and the generated ones at different dimensional levels. With these metrics, there is a numeric evaluation as a function that could be minimized for algorithmic music generation.

Keywords: music analysis, arts and humanities, music computing, high-dimensional visualization

### 1. Introduction

Algorithmic composition is not new, and this is because tonal western music composition follows rules, structures, and guidelines that make it algorithmic. Before the beginning of this computational research area, there were already proposed methods of algorithmic composition, one example of the most common ways was to try to replicate the ability of music composition with the "Mozart's dice" technique. Another example of a proposed method was a "species counterpoint" system for counterpoint composition [1]. These methods share the same problem with the new computational methods: after more than 50 years, there is still no metric to evaluate the resulting compositions or the system performance. The most common way for having a numeric result is by comparing the compositions with its references contained in the example music batch.

Nowadays, neural networks are the main research area for trying to replicate the music composition ability or for extracting human creativity features [2]; the lack of metrics is still the main problem for determining the loss function used to optimize the networks or even not knowing what needs to be measured [3].

A structured rules system, known as a knowledge database, is often avoided due to the time-consuming effort of extracting and structuring the general rules for composing in a determined style or genre. Another problem of this rule-based composition approach is that it may only create a specific type of compositions with the extracted guidelines [4]. These are the main reasons why most of the researchers try not to use a rule-based system.

The model can extract features of a composition in MIDI format, selected by the main user as the reference for the system. These characteristics would then help to measure the composition profile with Euclidian distance into a self-similarity matrix, and by comparison to each of the references in the higher dimension, they are projected into a lower one. The model searches for specific features in the composition, such as harmonies in 4/4 measures.

each of the instruments and with the number of the last tick. At the above example, we got the last tick and the number of channels for different instruments. The matrix we need to create has a width of 36,864 (last tick) and 4 rows as height (one for each instrument). Inside, the events of the MIDI file are specified as the length of a quarter note in pulses. In this example, the length of the quarter note is 1024; so, we multiply it by four and we get 4096, this will be the length of each measure (ml) in each of the instrument row from the file as Eq. (1), where the length L has the

> 2 6 4

ml <sup>¼</sup> <sup>36864</sup>

After creating the matrix, we fill each of the rows with the values of the notes for the duration specified in the MIDI file. In the example of the file, we have the first note to be 67 that starts at the pulse 3072 and it ends at 4096, so we fill the 1024 pulses with the value in the note (67). Since we know that each of the measures has an ml of 4096 pulses, we can compute the total number of measures for the file in

The piano roll representation of this file will look like Figure 2. Where each of the different colors represent a different instrument, the left-most moment is zero and the right-most is 36,864, the first row represents the note with value 0 and the upper one represents the note 127, middle C is represented by the key with value 60. The same information could be represented as the notation of the music sheet

The same information can also be represented as a continuous signal represented

each of the pulses inside the ith signal, we can obtain the first derivative of each of the functions with a numeric derivative by subtracting the previous moment p (Eq. 3).

00 ⋯ 368640 ⋮⋯ ⋮ 04 ⋯ 368644 3 7

<sup>4096</sup> <sup>¼</sup> <sup>9</sup> (2)

<sup>5</sup> (1)

ð Þ p where p represents

value of 36,864 in this example.

S ¼

each of the instruments as follows in Eq. 2:

from the same MIDI file as in Figure 3.

in Figure 4.

Figure 2.

121

Piano roll of an MIDI file.

2 6 4

00 ⋯ L<sup>0</sup> ⋮⋯⋮ 0<sup>n</sup> ⋯ Ln

A 3D Spatial Visualization of Measures in Music Compositions

DOI: http://dx.doi.org/10.5772/intechopen.83691

3 7 <sup>5</sup> ! <sup>S</sup> <sup>¼</sup>

total measures <sup>¼</sup> <sup>L</sup>

Now that we have each of the instruments as a function fi

When a composition is being made, there is a unique goal for the composer: to express something. The user would have this goal in mind when selecting the reference pieces, so the system could extract features and then replicate those characteristics in new music compositions. This is also useful when the selected references are a small number of examples, which is one of the constraints when using deep learning methods; they work with large datasets [5]. The analysis of the harmony or melody contour is being studied to find the features of the music pieces [6], but very few tools had been developed to study them as a numeric value. Other methods with feature extraction are Markov Models and N-grams [7, 8] but their main goal are for generation with a repetitive result.

The rest of this chapter is organized as follows. In Section 2, the proposed model for feature extraction of the reference piece of music is detailed. In Section 3, results obtained by testing the proposal are presented. In Section 4, conclusions are described.

### 2. Methodology

The MIDI file format has in its codification events or messages of notes described in tones in a specific moment called ticks, which are the smallest unit of time, along with other event descriptors like instruments, volume, effects, etc. Each MIDI note could have a value between 0 and 127. We need to identify the collections of events in which the note\_on and note\_off events are described so we could then transform them into a list of notes with its channel, note, volume, and duration properties. A typical list of events inside an MIDI file structure is shown in Figure 1.

We then take the last tick in all the sequences of events to know the longest length L in pulses. With this information, we can create a matrix with a row for


Figure 1. MIDI events description (left) and conversion into note events (right).

#### A 3D Spatial Visualization of Measures in Music Compositions DOI: http://dx.doi.org/10.5772/intechopen.83691

the extracted guidelines [4]. These are the main reasons why most of the

The model can extract features of a composition in MIDI format, selected by the main user as the reference for the system. These characteristics would then help to measure the composition profile with Euclidian distance into a self-similarity matrix, and by comparison to each of the references in the higher dimension, they are projected into a lower one. The model searches for specific features in the

When a composition is being made, there is a unique goal for the composer: to

The rest of this chapter is organized as follows. In Section 2, the proposed model for feature extraction of the reference piece of music is detailed. In Section 3, results obtained by testing the proposal are presented. In Section 4, conclusions are

The MIDI file format has in its codification events or messages of notes described in tones in a specific moment called ticks, which are the smallest unit of time, along with other event descriptors like instruments, volume, effects, etc. Each MIDI note could have a value between 0 and 127. We need to identify the collections of events in which the note\_on and note\_off events are described so we could then transform them into a list of notes with its channel, note, volume, and duration properties. A typical list of events inside an MIDI file structure is shown in Figure 1. We then take the last tick in all the sequences of events to know the longest length L in pulses. With this information, we can create a matrix with a row for

express something. The user would have this goal in mind when selecting the reference pieces, so the system could extract features and then replicate those characteristics in new music compositions. This is also useful when the selected references are a small number of examples, which is one of the constraints when using deep learning methods; they work with large datasets [5]. The analysis of the harmony or melody contour is being studied to find the features of the music pieces [6], but very few tools had been developed to study them as a numeric value. Other methods with feature extraction are Markov Models and N-grams [7, 8] but their

researchers try not to use a rule-based system.

Technology, Science and Culture - A Global Vision

composition, such as harmonies in 4/4 measures.

main goal are for generation with a repetitive result.

MIDI events description (left) and conversion into note events (right).

described.

Figure 1.

120

2. Methodology

each of the instruments and with the number of the last tick. At the above example, we got the last tick and the number of channels for different instruments. The matrix we need to create has a width of 36,864 (last tick) and 4 rows as height (one for each instrument). Inside, the events of the MIDI file are specified as the length of a quarter note in pulses. In this example, the length of the quarter note is 1024; so, we multiply it by four and we get 4096, this will be the length of each measure (ml) in each of the instrument row from the file as Eq. (1), where the length L has the value of 36,864 in this example.

$$\mathbf{S} = \begin{bmatrix} \mathbf{0}\_0 & \cdots & L\_0 \\ \vdots & \cdots & \vdots \\ \mathbf{0}\_n & \cdots & L\_n \end{bmatrix} \to \mathbf{S} = \begin{bmatrix} \mathbf{0}\_0 & \cdots & \mathbf{36864}\_0 \\ \vdots & \cdots & \vdots \\ \mathbf{0}\_4 & \cdots & \mathbf{36864}\_4 \end{bmatrix} \tag{1}$$

After creating the matrix, we fill each of the rows with the values of the notes for the duration specified in the MIDI file. In the example of the file, we have the first note to be 67 that starts at the pulse 3072 and it ends at 4096, so we fill the 1024 pulses with the value in the note (67). Since we know that each of the measures has an ml of 4096 pulses, we can compute the total number of measures for the file in each of the instruments as follows in Eq. 2:

$$\text{total\ measures} = \frac{L}{ml} = \frac{\text{36864}}{4096} = 9\tag{2}$$

The piano roll representation of this file will look like Figure 2. Where each of the different colors represent a different instrument, the left-most moment is zero and the right-most is 36,864, the first row represents the note with value 0 and the upper one represents the note 127, middle C is represented by the key with value 60.

The same information could be represented as the notation of the music sheet from the same MIDI file as in Figure 3.

The same information can also be represented as a continuous signal represented in Figure 4.

Now that we have each of the instruments as a function fi ð Þ p where p represents each of the pulses inside the ith signal, we can obtain the first derivative of each of the functions with a numeric derivative by subtracting the previous moment p (Eq. 3).

Figure 2. Piano roll of an MIDI file.

Figure 3. Music sheet of the song New-Age from the artist Marlon Roudette.

#### Figure 4.

Excerpt from the song P.S. I love you, where the red represents the guitar which means all those notes are played at the same time.

$$f'\_i(p) = f\_i(p) - f\_i(p-1) \tag{3}$$

where ml represents the measure length in pulses, s represents the quarter size in pulses, and k is the dimension value desired of the vector. Now that, we have all the vectors of each measure of each signal we can compute the distance matrix of size n � n for each signal by comparing with the Euclidean distance of each of the

> dmð Þ <sup>i</sup>;j;<sup>l</sup> ⋯ dmð Þ <sup>i</sup>;j;<sup>n</sup> ⋮⋱⋮ dmð Þ <sup>i</sup>;j;<sup>n</sup> ⋯ dmð Þ <sup>i</sup>;j;<sup>n</sup>

where dmð Þ <sup>i</sup>;j;<sup>l</sup> stands for the Euclidian distance between measure j and measure l from signal i and n is the number of measures in the signals. Another consideration is that we are experimenting only with time signatures of 4/4 in each case of

Self-similarity matrices had been used for waveform analysis and enhance attributes of similarity along time in the songs as time series analysis. In Figure 6, we can see the results of the self-similarity matrices of the measure of applying the method to a midi file. Each measure feature is measured with each of the other measures in the file with the Euclidean distance. The diagonal in each of the matrices is the equivalent of the distance of the measure to itself. Since it is a square matrix, the complete information could be in one of the two sides of the matrix along the diagonal. The similarity result is normalized to enhance the contrast in the visualization of the rendered image. This normalization helps to increase distances

We used the similarity matrix to project the information in a 3D environment. The resulting distances are used as a guide to emulate the vectors in the lower dimensional space. In Table 1, we have an example of four vectors of dimension four and the resulting self-similarity matrix which is used to represent them in a 3D

To reduce the dimensionality, we loop through all the vectors and compare the distances to the actual ones in the self-similarity matrix. In the lower dimension, we

� � � Dij <sup>∨</sup> <sup>t</sup> (7)

c<sup>i</sup> and compare the distance to Vb<sup>j</sup> and we compare it to their equivalent

distance in the self-similarity matrix of the higher dimension as Eq. 7:

d V c<sup>i</sup> ;Vb<sup>j</sup> 3 7

<sup>5</sup> (6)

Di ¼

A 3D Spatial Visualization of Measures in Music Compositions

DOI: http://dx.doi.org/10.5772/intechopen.83691

different signature several accommodations should be made.

between clusters, resulting in a better representation.

2 6 4

measures as in Eq. 6.

Midi file signal differences graph.

Figure 5.

3. Results

and 2D projection.

take V

123

where p stands for the pulse of the signal i (channel/instrument), the derivatives would look as follows in Figure 5.

The next step is to extract the four representative numbers of each measure mil from each signal; this is done by selecting from each measure mi,l (with 4096 values in this example) dividing it by four for each quarter <sup>q</sup>ð Þ <sup>i</sup>;l;<sup>k</sup> with size <sup>s</sup> equal to 1024. The biggest absolute change from each range of quarters is selected to represent each quarter values of the measure to form a vector of dimension four as in Eq. 4.

$$
\overrightarrow{w\_{(i,l)}} = \left[q\_{(i,l,1)}, q\_{(i,l,2)}, q\_{(i,l,3)}, q\_{(i,l,4)}\right] \tag{4}
$$

where <sup>i</sup> represents the signal number, <sup>l</sup>, the measure number, and each <sup>q</sup>ð Þ <sup>i</sup>;l;<sup>k</sup> value of the vector is selected as follows in Eq. 5:

$$q\_{ilk} = \left\{ \mathbf{x} : f\_i'(l \ast ml \ast (k - 1) \ast \mathbf{s}) \le \max \mathbf{v} \ge \mathbf{v}f\_i'(l \ast ml \ast \mathbf{k} \ast \mathbf{s}) \right\} \tag{5}$$

A 3D Spatial Visualization of Measures in Music Compositions DOI: http://dx.doi.org/10.5772/intechopen.83691

Figure 5. Midi file signal differences graph.

where ml represents the measure length in pulses, s represents the quarter size in pulses, and k is the dimension value desired of the vector. Now that, we have all the vectors of each measure of each signal we can compute the distance matrix of size n � n for each signal by comparing with the Euclidean distance of each of the measures as in Eq. 6.

$$D\_i = \begin{bmatrix} dm\_{(ij,l)} & \cdots & dm\_{(ij,n)} \\ \vdots & \ddots & \vdots \\ dm\_{(ij,n)} & \cdots & dm\_{(ij,n)} \end{bmatrix} \tag{6}$$

where dmð Þ <sup>i</sup>;j;<sup>l</sup> stands for the Euclidian distance between measure j and measure l from signal i and n is the number of measures in the signals. Another consideration is that we are experimenting only with time signatures of 4/4 in each case of different signature several accommodations should be made.

## 3. Results

f0

Music sheet of the song New-Age from the artist Marlon Roudette.

Technology, Science and Culture - A Global Vision

vð Þ <sup>i</sup>;<sup>l</sup> ��!

value of the vector is selected as follows in Eq. 5:

0 i

qilk ¼ x : f

would look as follows in Figure 5.

Figure 3.

Figure 4.

122

at the same time.

<sup>i</sup>ð Þ¼ p fi

ð Þ� p fi

Excerpt from the song P.S. I love you, where the red represents the guitar which means all those notes are played

where p stands for the pulse of the signal i (channel/instrument), the derivatives

The next step is to extract the four representative numbers of each measure mil from each signal; this is done by selecting from each measure mi,l (with 4096 values in this example) dividing it by four for each quarter <sup>q</sup>ð Þ <sup>i</sup>;l;<sup>k</sup> with size <sup>s</sup> equal to 1024. The biggest absolute change from each range of quarters is selected to represent each quarter values of the measure to form a vector of dimension four as in Eq. 4.

> <sup>¼</sup> <sup>q</sup>ð Þ <sup>i</sup>;l;<sup>1</sup> ; <sup>q</sup>ð Þ <sup>i</sup>;l;<sup>2</sup> ; <sup>q</sup>ð Þ <sup>i</sup>;l;<sup>3</sup> ; <sup>q</sup>ð Þ <sup>i</sup>;l;<sup>4</sup> h i

where <sup>i</sup> represents the signal number, <sup>l</sup>, the measure number, and each <sup>q</sup>ð Þ <sup>i</sup>;l;<sup>k</sup>

ð Þ l ∗ ml ∗ ð Þ k � 1 ∗ s ≤ max ∨ x ∨ f

ð Þ p � 1 (3)

0 i

ð Þ <sup>l</sup> <sup>∗</sup> ml <sup>∗</sup> <sup>k</sup> <sup>∗</sup> <sup>s</sup> � � (5)

(4)

Self-similarity matrices had been used for waveform analysis and enhance attributes of similarity along time in the songs as time series analysis. In Figure 6, we can see the results of the self-similarity matrices of the measure of applying the method to a midi file. Each measure feature is measured with each of the other measures in the file with the Euclidean distance. The diagonal in each of the matrices is the equivalent of the distance of the measure to itself. Since it is a square matrix, the complete information could be in one of the two sides of the matrix along the diagonal. The similarity result is normalized to enhance the contrast in the visualization of the rendered image. This normalization helps to increase distances between clusters, resulting in a better representation.

We used the similarity matrix to project the information in a 3D environment. The resulting distances are used as a guide to emulate the vectors in the lower dimensional space. In Table 1, we have an example of four vectors of dimension four and the resulting self-similarity matrix which is used to represent them in a 3D and 2D projection.

To reduce the dimensionality, we loop through all the vectors and compare the distances to the actual ones in the self-similarity matrix. In the lower dimension, we take V c<sup>i</sup> and compare the distance to Vb<sup>j</sup> and we compare it to their equivalent distance in the self-similarity matrix of the higher dimension as Eq. 7:

$$d\left(\widehat{\mathbf{V}\_i}, \widehat{\mathbf{V}\_j}\right) - D\_{\vec{\mathbf{v}}} \vee \mathbf{t} \tag{7}$$

Figure 6. Example of an image of self-similarity matrices of vectors extracted from an MIDI file.


Table 1.

Example vectors (left) and their self-similarity matrix (right) given by the Euclidian distance.

where V c<sup>i</sup> and Vb<sup>j</sup> are the vectors in the lower dimension and Dij are the distances in the self-similarity matrix. If the difference is higher than a threshold t, we adjust the position of the vector with Eq. (8) with a given increment α until the distances are close to the ones in the self-similarity matrix.

$$
\widehat{V\_i} = \left(a \ast \widehat{V\_i}\right) + \left((1 - a) \ast \widehat{V\_j}\right) \tag{8}
$$

In the results of the vectors in 3D and 2D from the table, we can see a corresponding relationship that is close between both dimensions that preserve a relative distance between objects. This method is a fast tool to visualize the spatial distribution, and even if we do not make clusters, we can see the behavior of the vectors. We can use the same principle to visualize the entire file with all its

Vectors of four dimension 3D visualization (left) and 2D visualization of same vectors (right).

A 3D Spatial Visualization of Measures in Music Compositions

DOI: http://dx.doi.org/10.5772/intechopen.83691

This work is related to music composition, and as a part of the sequence of steps needed to complete such task is required a map of the structure of a musical piece. This tool is aiming into finding similarities with a numeric precision and thus obtaining a metric that could be used as an objective function in an optimization task. This method is also a new tool for fast visualization of measures in an MIDI file even with a high dimensionality and it could help discover hidden information

measures of the self-similarity matrix as seen in Figure 8.

3D visualization of the projections from the vectors in 4D.

4. Conclusion

125

Figure 8.

Figure 7.

inside of the music piece.

In the Figure 7, we present the projection of the four vectors in the 3D space (left) and 2D space (right). The vector B is painted in blue for ease of visualization of the correspondence of the projections in both dimensions.

A 3D Spatial Visualization of Measures in Music Compositions DOI: http://dx.doi.org/10.5772/intechopen.83691

#### Figure 7.

Vectors of four dimension 3D visualization (left) and 2D visualization of same vectors (right).

Figure 8. 3D visualization of the projections from the vectors in 4D.

In the results of the vectors in 3D and 2D from the table, we can see a corresponding relationship that is close between both dimensions that preserve a relative distance between objects. This method is a fast tool to visualize the spatial distribution, and even if we do not make clusters, we can see the behavior of the vectors. We can use the same principle to visualize the entire file with all its measures of the self-similarity matrix as seen in Figure 8.

## 4. Conclusion

This work is related to music composition, and as a part of the sequence of steps needed to complete such task is required a map of the structure of a musical piece. This tool is aiming into finding similarities with a numeric precision and thus obtaining a metric that could be used as an objective function in an optimization task. This method is also a new tool for fast visualization of measures in an MIDI file even with a high dimensionality and it could help discover hidden information inside of the music piece.

where V

Table 1.

124

Figure 6.

c<sup>i</sup> and Vb<sup>j</sup> are the vectors in the lower dimension and Dij are the distances

þ ð Þ 1 � α ∗Vb<sup>j</sup> � �

(8)

in the self-similarity matrix. If the difference is higher than a threshold t, we adjust the position of the vector with Eq. (8) with a given increment α until the distances

A B CD A B C D 0.45669 0.43307 0.43307 0.37795 A 0 0.11274202 0.04591189 0.0822064 �0.05512 �0.05512 �0.05512 �0.05512 B 0.11274202 0 0.11705919 0.12549493 �0.09449 �0.09449 �0.05512 �0.07087 C 0.04591189 0.11705919 0 0.05732606 �0.05512 0.05512 �0.05512 �0.05512 D 0.0822064 0.12549493 0.05732606 0

In the Figure 7, we present the projection of the four vectors in the 3D space (left) and 2D space (right). The vector B is painted in blue for ease of visualization

are close to the ones in the self-similarity matrix.

V

c<sup>i</sup> ¼ α ∗V

Example of an image of self-similarity matrices of vectors extracted from an MIDI file.

Technology, Science and Culture - A Global Vision

of the correspondence of the projections in both dimensions.

ci � �

Example vectors (left) and their self-similarity matrix (right) given by the Euclidian distance.

Technology, Science and Culture - A Global Vision

References

513-582

[1] Fernández JD, Vico F. AI methods in

DOI: http://dx.doi.org/10.5772/intechopen.83691

A 3D Spatial Visualization of Measures in Music Compositions

[2] Briot JP, Hadjeres G, Pachet F. Deep

generation—A survey. arXiv Preprint.

[3] Briot JP, Pachet F. Music generation by deep learning—Challenges and directions. arXiv Preprint. 2018. arXiv: 1712.04371. Available at https://arxiv.

[4] Colombo F, Gerstner W. A general model of music composition. arXiv Preprint. 2018. arXiv:1802.05162

[5] Jaques N, Gu S, Turner RE, Eck D. Tuning Recurrent Neural Networks with Reinforcement Learning. 2017

[6] Prince JB. Contributions of pitch contour, tonality, rhythm, and meter to

[7] Pachet F, Roy P. Markov constraints:

melodic similarity. Journal of Experimental Psychology: Human Perception and Performance. 2014;

Steerable generation of Markov sequences. Constraints. 2011;16(2):

[8] Chordia P, Sastry A, Şentürk S. Predictive tabla modelling using variable-length Markov and hidden Markov models. Journal of New Music

Research. 2011;40(2):105-118

40(6):2319

148-172

127

algorithmic composition: A comprehensive survey. Journal of Artificial Intelligence Research. 2013;48:

learning techniques for music

2017. arXiv:1709.01620

org/pdf/1712.04371.pdf
