Analysis of Emotions in Musical Expression

Junko Kimura

5/16/2002

Abstract

The present paper examined a hypothesis: “music players can convey their intentions to listeners successfully through purely instrumental music pieces”. In the experiment, first, three amateur female violinists improvised 0.3 – 25 second pieces, expressing seven various emotions. Second, seven randomly selected listeners attempted to identify which emotion the players had tried to express.

The result demonstrated that the violinists could successfully convey emotions, such as sadness, tenderness and happiness with an over 70% success rate. However, some confusion of the identification occurred in the case of fear, anger and frustration. In the reasoning of these results, the framework about the musical expressive cues of basic emotions presented by Juslin (2001) and Bresin and Friberg(1999) was verified. These expressive cues represent the emotions well; moreover, they are also applicable to the present violin samples. The finding also indicates that the confusion happens when some of the cues are played differently. This shows that a proper playing of the expressive cues is critical in conveying basic emotions to the listeners.

1. Introduction

Music has various components in its artistic expressions. Music composers convey their messages in their scores. They use annotations in order to express the sense in which players can realize what the composers intend. However, no performers play the same music in the same way because musicians interpret the scores and articulate their interpretations. Even the same musicians play identical pieces differently as their interpretations change, their emotional state varies or they age.

In the philosophical literature, there has been a long-standing dispute between proponents of two theories – expressive and arousal theories. The former analyzes the music's expressiveness as depending on the composer expressing his or her occurrent emotion through the composition. The latter explains this expressiveness as its propensity to evoke the corresponding emotion in the listener (Davis, 2001). The expressive theory is empirically incorrect, and the arousal theory seems to explain this phenomenon more accurately. It is right that the music arouse corresponding emotions in the listener. However, I doubt that our mood changes as the consequence of the music's expressiveness. In this vein, I agree with Davis: "..., we have a clear sense of the music's expressive character as quite distinct from our ... responses to it” (Davis, 2001: 33). I argue that we have a clear sense of recognizing what the music is delivering, even though the music is dominated by purely instrumental forms.

In order to analyze a performer's emotions, it is useful to compare recordings of the same pieces with different players, because, as I said before, the emotions of composers and players intertwine in a music performance (Juslin, 2001). Therefore, to interpret only one playing is not an appropriate approach. Repp (1992) compared twenty-eight performances of Schumann’s Träumerei and found common patterns in the interpretations. Beran and Mazzola (2000) further modeled the tempo in the same Schumann piece. Canazza et al. (1997) identified modifications by the players, analyzing Clarinet Concerto (K.622) composed by Mozart. Furthermore, Juslin (1997) and Bresin and Friberg (1999) focused more on the players’ expressiveness. Juslin measured how professional guitarists convey the four basic emotions, anger, sadness, happiness and fear, to listeners. He found three cues, tempo, loudness and articulation, using signal analysis. Bresin and Friberg extended Juslin's study, identifying time deviations and final retardando as additional cues.

As I note in the next chapter, Juslin (2001) delivers a core framework on the subjective cues in the performer’s communication. He tested the validity of these cues by conducting listening experiments. However, the music expressiveness is dependent on the instrument as well as the performer’s skills. In the present paper, I will examine the music expressiveness in the violin passages where the performers and the listeners are able to communicate successfully.

2. Expressive Cues

Juslin (2001) summarizes expressive cues of five basic emotions, such as happiness, anger, fear, sadness and tenderness. He placed these emotional expressions into a two-dimensional emotional space constituted by valence and activity level. Each emotional expression consists of expressive cues, such as tempo, sound level and articulation (Figure 1). Breslin and Friberg's work (1999) follows this framework.

The duration of the passages played by the violinists was between 0.3 to 25 seconds. The violinists integrated the emotions into these short passages to make them as expressive as possible, using a few selected cues on the table. Moreover, the violinists selected different cues. If the listener could identify the emotion of one passage better than that of the other, one could conclude that the cue of the identified passage conveyed the information more effectively than the other cues. Therefore, my examination provides preliminary testing of the dominant cues of each basic emotion in the violin play.

Figure 1 Expressive cues classified by Juslin (2001) p.315

3. Experiment

3.1 Recording violin samples

Three female amateur violinists played 0.3 to 25 second passages expressing various emotions. They played either improvised or previously composed pieces in the studio. They played in the following order in a quiet music studio.

1. Fear

2. Sadness

3. Anger

4. Tenderness

5. Happiness

6. Frustration

7. Surprise

After recording, the violinists listened to their rendition and replayed until they were satisfied with their playing. After all had played, the violinists completed a questionnaire regarding their satisfaction with each playing, specific situations which they had imagined and their mood on that day (Appendix 1).

3.2 Decoding Test

Seven subjects assessed the efficiency of the emotional communication. Twenty-one short pieces were recorded in the PCM as standard sound files (.aiff files) and posted them on the web-page (http://web.mit.edu/~jkimura/www/cayyou.htm). The listeners were 25-45 years old, three males and four females (Table 1), none of whom were musicians or music students. They individually examined the violin samples and identified the emotional expression of each example as one of seven alternative substantives listed above. The samples were listed randomly and the listener's responses were submitted by e-mail to me.

Table 1 Profile of the seven listeners

	Male / female	Nationality
1	Male	Italian
2	Male	Japanese
3	Male	Japanese
4	Female	Japanese
5	Female	French
6	Female	American
7	Female	American

4. Results

4.1 Communication between players and listeners

Table 2 shows the level of satisfaction of each player with the recordings and the percentage of correct responses by the listeners. Anger, Fear and Frustration were difficult to identify: less than fifty percent were correctly identified. On the other hand, more than fifty percent of the happiness, sadness, surprise and tenderness samples were identified. Interestingly, a player’s own satisfaction does not always correspond to the percentage of the correct answers. For example, although the players were satisfied with their playing of the samples of anger and frustration, the listeners could not identify them correctly.

Table 2 Percent of the correct answers to each playing

	Fear			Sadness			Anger			Tenderness			Happiness			Frustration			Surprise
Player	A	B	C	A	B	C	A	B	C	A	B	C	A	B	C	A	B	C	A	B	C
Player’s satisfaction	3	4	4	2	5	4	3	4	5	4	5	4	4	5	4	5	3	3	2	4	4
% of correct answers	43	29	0	100	86	100	29	29	14	71	100	71	100	100	86	29	29	57	57	86	43

Note:

1) Player A: American, Player B: Portuguese, Player C: American

2) Player’s satisfaction was evaluated in the five levels.

1. Not well 2. Not so well 3. Fair 4. Well 5. Very well

Table 3 shows the confusion matrix for the classification test of each intended emotion. In the fear samples performed by players B and C and the anger samples by players A and C, they were incorrectly recognized as sadness and happiness and sadness and frustration respectively.

4.2 Expressive cues

In this section, I will analyze the cues with which the listeners determined the identification of each emotion. The analysis was done only for the emotions where the players and the listeners could communicate successfully.

4.2.1 Subjective test

Each sample was analyzed by listening to the sample and choosing applicable dominant cues, referring to Figure 1 and Bresin and Friberg (1999). Table 4 summarizes the cues which I could recognize in each sample for which the listeners could identify the players’ intentions correctly.

Table 3 Confusion matrix (%) for the classification test of each intended emotion

	Fear			Sadness			Anger			Tenderness			Happiness			Frustration			Surprise
Intended Emotion	A	B	C	A	B	C	A	B	C	A	B	C	A	B	C	A	B	C	A	B	C
Fear	43	29	0	0	0	0	0	14	14	14	0	0	0	0	0	29	29	14	14	0	14
Sadness	29	57	14	100	86	100	57	0	0	0	0	14	0	0	0	14	0	0	14	0	0
Anger	0	0	0	0	0	0	29	29	14	0	0	0	0	0	0	14	14	14	0	0	0
Tenderness	14	0	0	0	14	0	14	0	0	71	100	71	0	0	0	0	0	0	0	0	0
Happiness	0	0	43	0	0	0	0	0	0	14	0	0	100	100	86	0	0	0	0	0	29
Frustration	14	14	28	0	0	0	0	29	57	0	0	14	0	0	0	29	29	57	14	14	14
Surprise	0	0	14	0	0	0	0	29	14	0	0	0	0	0	14	14	29	14	57	86	43

Table 4 Expressive cues in the violin samples whose intention was successfully identified by the listeners

% of correct answers

Expressive cues in the sample

Fear (A)

l Low sound level

l Fast vibrato

l Articulation: non legato

Sadness (A)

Sadness (B)

Sadness (C)

100

l Slow tempo

l Legato articulation

l Duration: Long (20-25 sec)

l Minor Key

l No articulation variability

Tenderness (A)

Tenderness (B)

Tenderness (C)

100

l Slow tempo

l No Tone attacks

l Legato articulation

l Soft Timbre

Happiness (A)

Happiness (B)

Happiness (C)

100

l Fast tempo

l Small tempo variability

l Staccato, airy articulation

l High sound level

l Little sound level variability

l Bright timbre

l Fast tone attacks

l Small timing variations

Frustration (C)

l Tempo variation

l Sound level variability

l Tempo variability

l Discord

Surprise (A)

Surprise (B)

Surprise (C)

l Staccato articulation

l Duration: short (0.3 – 6 sec)

l Large sound level variability

l High sound level

l Tempo variability

Note: (A)(B)(C) indicate the players

4.2.2 Quantitative test

(1) Duration

According to Juslin’s framework, the duration varies for different intentions. The length of the playing verifies this argument.

(2) Articulation

Articulation means the staccato or legato of the piece. As there is a short pause after the staccato, I assume that the standard deviation, normalized by one second, represents the articulation level.

Table 5 summarizes the duration and articulation. This result verifies the subjective analysis discussed in 4.2.1

Table 5 Summary of the characteristics of each sample

		Duration (seconds)		Standard deviation of the sound level (normalized)
Emotion	Player	Individual	Average	Individual	Average
Fear	A	14	14	0.0192	0.0192
Sadness	A	19	23	0.0268	0.0188
	B	25		0.0175
	C	25		0.0121
Tenderness	A	19	16	0.0159	0.0175
	B	11		0.0198
	C	16		0.0167
Happiness	A	9.8	13	0.0247	0.0328
	B	7.8		0.0278
	C	15		0.0460
Frustration	C	23	23	0.0230	0.0230
Surprise	A	6	3.6	0.0434	0.0713
	B	0.3		0.1313
	C	4.5		0.0825

Note: Since duration and standard deviation were calculated using the number of digital samples, the number is not accurate.

The sound level is defined by wav format. Its range is between –1 and 1.

4.3 Confused identifications

(1) Fear (b)

Fear (b) was confused with sadness. The tempo of the sample was slow and had legato articulation. On the other hand, Juslin's framework indicates the opposite: the sample is missing fast tempo and staccato articulation.

(2) Fear (c)

Fear (c) was confused with happiness. The sound level of this sample was high, which might have been misinterpreted as happiness. Fear is characterized by a low sound level.

(3) Anger (a)

Anger (a) is characterized with a low sound level, which is the expressive key of sadness. Moreover, anger (a) is a minor key, which might have led to the confusion.

(4) Anger (c)

Anger (c) is identified as frustration and it has a low sound level and regular tone attacks. These are not the cues of anger. However, they do not seem to be significant for the misjudgment. Both are important cues for anger and frustration. In this case, one could not conclude that the player performed imprecisely, but these cues are more likely to be judged as frustration.

Since Justin and others have not provided the cues of frustration, it has not been verified whether these characteristics represent frustration.

5. Discussion

The main result from this investigation is that the emotions can be conveyed through purely instrumental short pieces from players to listeners. I used short anonymous pieces: the longest one lasted 25 seconds. The violinists are skillful enough to play a wide range of music; however, they are not familiar with improvising. Even so, they could express their understanding about a word, and integrated underlying interpretations of short pieces, which general listeners could capture correctly. This is a real transfer of information.

Moreover, the expressive cues proposed by Juslin, Bresin and Friberg could be successfully applied to the case of violins. However, the findings in the present work also indicate that some emotions are difficult to communicate. There was confusion on the players’ side: cues were used differently in the confused cases. However, it cannot be concluded that the players performed inaccurately, but the communication between the players and the listeners failed. The failure in the communication might have been due to the selection of the seven emotions in the present work. Especially in the two dimensional valence – activity framework, anger and frustration are located close to each other. This might have led to the confusion both for the players and listeners.

6. Further Research

This research can be extended further to the identification of the dominant cues of each emotion. One possible method would be to manipulate samples and excerpt one cue using digital filters and test whether the listeners could still identify the corresponding intentions.

The present work was a preliminary examination of the expressive cues of frustration. Since there was only one successful sample, the investigation was limited. The number of successful samples is the key to knowing the distinctions between frustration and anger.

It would be also interesting to research further about complex emotions, such as pride, which is defined as the combination of happiness and anger.

Furthermore, it would be interesting to know how authentic composers expressed emotions. Especially composers of the romantic music during the nineteenth century, like Robert Schumann, attempted to integrate a pure message into their pieces (Plantinga, 1984). Although they do not easily provide clues to us, it would be intriguing to test their selected pieces.

A possible future application is the musical recognition as a communication tool. With limited information of short instrumental pieces, one can convey feelings; similarly with a short message of the feelings, one can reconstruct “emotional” music pieces. Similar research about data transmission using facial information is ongoing (http://prius.hc.t.u-tokyo.ac.jp/~jikken/index-e.html). The underlying idea is applicable to future musical communication.

References

Beran, J. & Mazzola, G.: "Timing Microstructure in Schumann's "Träumerei" as an Expression of Harmony, Rhythm, and Motivic Structure in Music Performance" Computers and Mathematics with Applications 39 (2000) pp.99-130

Bresin, R.; Friberg, A. "Synthesis and decoding of emotionally expressive music performance" Systems, Man, and Cybernetics, 1999. IEEE SMC '99 Conference Proceedings. 1999 IEEE International Conference Volume: 4 , pp. 317 -322 (1999)

Canazza, Sergio; De Poli, Giovannni; Rinaldin, Stefano & Vidolin, Alvise: "Sonological Analysis of Clarinet Expressivity" in Leman, Marc (ed.) "Music Gestalt, and Computing: Studies in Cognitive and Systematic Musicology" Springer (1997)

Davis, Stephen: “Philosophical Perspectives on Music’s Expressiveness” in Juslin, Patrik N. and Sloboda, John A. (eds.) “Music and Emotion: theory and research”, Oxford University Press (2001)

Juslin, Patrik N.:"Emotioinal Communication in Music Performance: A Functionalist Perspective and Some Data", Music Perception (1997) vol. 14, No. 4, pp.384-418

Juslin, Patrik N.: “Communicating Emotion in Music Performance: A Review and Theoretical Framework” in Juslin, Patrik N. & Sloboda, John A. (eds.) “Music and Emotion: Theory and Research”, Oxford University Press (2001)

Plantinga, Leon: “Romantic music”, W.W. Norton & Company, Inc. (1984)

Repp, Bruno H.:"Diversity and commonality in music performance: An analysis of timing microstructure in Schumann's Träumerei" J. Acoust. Soc. Am. 92(5) November 1992 pp. 2546-2568

Appendix 1 Questionnaire to the players

Questionnaire

Name:

Where are you from?

How do you describe your mood today? (happy/sad etc.)

1. Fear

a. Situation:

b. Did you express the situation well?

Not well Not so well Fair Well Very well

1 2 3 4 5

c. When you composed the phrase did you have any existing musical pieces in mind? Yes/no

c. 1.If yes, what did you have?

c. 2. Did you refer this piece?

2. Sadness

a. Situation:

b. Did you express the situation well?

Not well Not so well Fair Well Very well

1 2 3 4 5

c. When you composed the phrase did you have any existing musical pieces in mind? Yes/no

c.1.If yes, what did you have?

c.2 Did you refer this piece?

3. Anger

a. Situation:

b. Did you express the situation well?

Not well Not so well Fair Well Very well

1 2 3 4 5

c. When you composed the phrase did you have any existing musical pieces in mind? Yes/no

c.1. If yes, what did you have?

c.2. Did you refer this piece?

4. Tenderness

a. Situation:

b. Did you express the situation well?

Not well Not so well Fair Well Very well

1 2 3 4 5

c. When you composed the phrase did you have any existing musical pieces in mind? Yes/no

c.1. If yes, what did you have?

c.2. Did you refer this piece?

5. Happiness

a. Situation:

b. Did you express the situation well?

Not well Not so well Fair Well Very well

1 2 3 4 5

c. When you composed the phrase did you have any existing musical pieces in mind?

c.1. If yes, what did you have?

c.2. Did you refer this piece?

6. Frustration

a. Situation:

b. Did you express the situation well?

Not well Not so well Fair Well Very well

1 2 3 4 5

c. When you composed the phrase did you have any existing musical pieces in mind?

c.1. If yes, what did you have?

c.2. Did you refer this piece?

7. Surprise

a. Situation:

b. Did you express the situation well?

Not well Not so well Fair Well Very well

1 2 3 4 5

c. When you composed the phrase did you have any existing musical pieces in mind?

c.1. If yes, what did you have?

c.2. Did you refer this piece?

General questions

What kind of music do you usually listen to?

Specifically,

Composers:

Pieces:

Thank you so much for cooperation!

Junko Kimura

Appendix 2

2-1. Waveforms of the successful samples

(1) Fear

Fear (A) is characterized by vibratos.

(2) Sadness

(3) Tenderness

(4) Happiness

Happiness samples are characterized by staccatos.

(5) Frustration

The last part (after fifteen seconds) is discord.

(6) Surprise

Surprise samples are characterized by a short passage.

Appendix 2-2 Spectra of the successful signals[1] (sampling frequency is 44100Hz)

(A) Fear

(2) Sadness

(3) Tenderness

(4) Happiness

(5) Frustration

(6) Surprise

Appendix 2-3 Spectrogram of the frustration sample

The last part of the spectrogram of the frustration sample was a discord which was characterized by the mixed signals of various frequencies.

On the other hand, the happiness sample was a concord which was characterized by the harmony of the fundamental frequency.

[1] Omega is defined as 2*π*(the index of the sample)/(the number of samples)