Analysis of
Emotions in Musical Expression
Junko Kimura
5/16/2002
The present paper examined a hypothesis: gmusic players can convey their intentions to listeners successfully through purely instrumental music piecesh. In the experiment, first, three amateur female violinists improvised 0.3 – 25 second pieces, expressing seven various emotions. Second, seven randomly selected listeners attempted to identify which emotion the players had tried to express.
The result demonstrated that the violinists could successfully convey emotions, such as sadness, tenderness and happiness with an over 70% success rate. However, some confusion of the identification occurred in the case of fear, anger and frustration. In the reasoning of these results, the framework about the musical expressive cues of basic emotions presented by Juslin (2001) and Bresin and Friberg(1999) was verified. These expressive cues represent the emotions well; moreover, they are also applicable to the present violin samples. The finding also indicates that the confusion happens when some of the cues are played differently. This shows that a proper playing of the expressive cues is critical in conveying basic emotions to the listeners.
Music has various components in its artistic
expressions. Music composers convey their messages in their scores. They use
annotations in order to express the sense in which players can realize what the
composers intend. However, no performers play the same music in the same way
because musicians interpret the scores and articulate their interpretations.
Even the same musicians play identical pieces differently as their
interpretations change, their emotional state varies or they age.
In the philosophical literature, there has been a
long-standing dispute between proponents of two theories – expressive and
arousal theories. The former analyzes the music's expressiveness as depending
on the composer expressing his or her occurrent emotion through the
composition. The latter explains this expressiveness as its propensity to evoke
the corresponding emotion in the listener (Davis, 2001). The expressive theory
is empirically incorrect, and the arousal theory seems to explain this
phenomenon more accurately. It is right that the music arouse corresponding
emotions in the listener. However, I doubt that our mood changes as the
consequence of the music's expressiveness. In this vein, I agree with Davis:
"..., we have a clear sense of the music's expressive character as quite
distinct from our ... responses to ith (Davis, 2001: 33). I argue that we have
a clear sense of recognizing what the music is delivering, even though the
music is dominated by purely instrumental forms.
In order to analyze a performer's emotions, it is
useful to compare recordings of the same pieces with different players,
because, as I said before, the emotions of composers and players intertwine in
a music performance (Juslin, 2001). Therefore, to interpret only one playing is
not an appropriate approach. Repp (1992) compared twenty-eight performances of Schumannfs
Träumerei and found common patterns in the interpretations. Beran and
Mazzola (2000) further modeled the tempo in the same Schumann piece. Canazza et
al. (1997) identified modifications by the players, analyzing Clarinet Concerto
(K.622) composed by Mozart. Furthermore, Juslin (1997) and Bresin and Friberg
(1999) focused more on the playersf expressiveness. Juslin measured how
professional guitarists convey the four basic emotions, anger, sadness,
happiness and fear, to listeners. He found three cues, tempo, loudness and
articulation, using signal analysis. Bresin and Friberg extended Juslin's
study, identifying time deviations and final retardando as additional
cues.
As I note in the next chapter, Juslin (2001) delivers a core framework on the subjective cues in the performerfs communication. He tested the validity of these cues by conducting listening experiments. However, the music expressiveness is dependent on the instrument as well as the performerfs skills. In the present paper, I will examine the music expressiveness in the violin passages where the performers and the listeners are able to communicate successfully.
Juslin (2001) summarizes expressive cues of five
basic emotions, such as happiness, anger, fear, sadness and tenderness. He
placed these emotional expressions into a two-dimensional emotional space
constituted by valence and activity level. Each emotional expression consists
of expressive cues, such as tempo, sound level and articulation (Figure 1).
Breslin and Friberg's work (1999) follows this framework.
The duration of the passages played by the
violinists was between 0.3 to 25 seconds. The violinists integrated the
emotions into these short passages to make them as expressive as possible, using
a few selected cues on the table. Moreover, the violinists selected different
cues. If the listener could identify the emotion of one passage better than
that of the other, one could conclude that the cue of the identified passage
conveyed the information more effectively than the other cues. Therefore, my examination
provides preliminary testing of the dominant cues of each basic emotion in the
violin play.

Figure 1 Expressive cues classified by
Juslin (2001) p.315
Three female amateur violinists played 0.3 to 25 second passages expressing various emotions. They played either improvised or previously composed pieces in the studio. They played in the following order in a quiet music studio.
1. Fear
2. Sadness
3. Anger
4. Tenderness
5. Happiness
6. Frustration
7. Surprise
After recording, the violinists listened to their rendition and replayed until they were satisfied with their playing. After all had played, the violinists completed a questionnaire regarding their satisfaction with each playing, specific situations which they had imagined and their mood on that day (Appendix 1).
Seven subjects assessed the efficiency of the
emotional communication. Twenty-one short pieces were recorded in the PCM as
standard sound files (.aiff files) and posted them on the web-page (http://web.mit.edu/~jkimura/www/cayyou.htm).
The listeners were 25-45 years old, three males and four females (Table 1),
none of whom were musicians or music students. They individually examined the
violin samples and identified the emotional expression of each example as one
of seven alternative substantives listed above. The samples were listed
randomly and the listener's responses were submitted by e-mail to me.
Table 1 Profile of the seven listeners
|
|
Male / female |
Nationality |
|
1 |
Male |
Italian |
|
2 |
Male |
Japanese |
|
3 |
Male |
Japanese |
|
4 |
Female |
Japanese |
|
5 |
Female |
French |
|
6 |
Female |
American |
|
7 |
Female |
American |
Table 2 shows the level of satisfaction of each player with the recordings and the percentage of correct responses by the listeners. Anger, Fear and Frustration were difficult to identify: less than fifty percent were correctly identified. On the other hand, more than fifty percent of the happiness, sadness, surprise and tenderness samples were identified. Interestingly, a playerfs own satisfaction does not always correspond to the percentage of the correct answers. For example, although the players were satisfied with their playing of the samples of anger and frustration, the listeners could not identify them correctly.
Table 2 Percent of the
correct answers to each playing
|
|
Fear |
Sadness |
Anger |
Tenderness |
Happiness |
Frustration |
Surprise |
||||||||||||||
|
Player |
|||||||||||||||||||||
|
Playerfs satisfaction |
3 |
4 |
4 |
2 |
5 |
4 |
3 |
4 |
5 |
4 |
5 |
4 |
4 |
5 |
4 |
5 |
3 |
3 |
2 |
4 |
4 |
|
% of correct answers |
43 |
29 |
0 |
100 |
86 |
100 |
29 |
29 |
14 |
71 |
100 |
71 |
100 |
100 |
86 |
29 |
29 |
57 |
57 |
86 |
43 |
Note:
1)
Player
A: American, Player B: Portuguese, Player C: American
2)
Playerfs
satisfaction was evaluated in the five levels.
1. Not well 2.
Not so well 3.
Fair 4. Well 5. Very well
Table 3 shows the confusion matrix for the
classification test of each intended emotion. In the fear samples performed by
players B and C and the anger samples by players A and C, they were incorrectly
recognized as sadness and happiness and sadness and frustration respectively.
In this section, I will analyze the cues with which the listeners determined the identification of each emotion. The analysis was done only for the emotions where the players and the listeners could communicate successfully.
Each
sample was analyzed by listening to the sample and choosing applicable dominant
cues, referring to Figure 1 and Bresin and Friberg (1999). Table 4 summarizes
the cues which I could recognize in each sample for which the listeners could
identify the playersf intentions correctly.
Table 3 Confusion matrix (%)
for the classification test of each intended emotion
|
|
Fear |
Sadness |
Anger |
Tenderness |
Happiness |
Frustration |
Surprise |
||||||||||||||
|
Intended Emotion |
A |
B |
C |
A |
B |
C |
A |
B |
C |
A |
B |
C |
A |
B |
C |
A |
B |
C |
A |
B |
C |
|
Fear |
43 |
29 |
0 |
0 |
0 |
0 |
0 |
14 |
14 |
14 |
0 |
0 |
0 |
0 |
0 |
29 |
29 |
14 |
14 |
0 |
14 |
|
Sadness |
29 |
57 |
14 |
100 |
86 |
100 |
57 |
0 |
0 |
0 |
0 |
14 |
0 |
0 |
0 |
14 |
0 |
0 |
14 |
0 |
0 |
|
Anger |
0 |
0 |
0 |
0 |
0 |
0 |
29 |
29 |
14 |
0 |
0 |
0 |
0 |
0 |
0 |
14 |
14 |
14 |
0 |
0 |
0 |
|
Tenderness |
14 |
0 |
0 |
0 |
14 |
0 |
14 |
0 |
0 |
71 |
100 |
71 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
|
Happiness |
0 |
0 |
43 |
0 |
0 |
0 |
0 |
0 |
0 |
14 |
0 |
0 |
100 |
100 |
86 |
0 |
0 |
0 |
0 |
0 |
29 |
|
Frustration |
14 |
14 |
28 |
0 |
0 |
0 |
0 |
29 |
57 |
0 |
0 |
14 |
0 |
0 |
0 |
29 |
29 |
57 |
14 |
14 |
14 |
|
Surprise |
0 |
0 |
14 |
0 |
0 |
0 |
0 |
29 |
14 |
0 |
0 |
0 |
0 |
0 |
14 |
14 |
29 |
14 |
57 |
86 |
43 |
Table 4 Expressive cues in the violin samples whose intention was successfully identified by the listeners
|
|
% of correct answers |
Expressive cues in the sample |
|
43 |
l Low sound level l Fast vibrato l Articulation: non legato |
|
|
100 86 100 |
l Slow tempo l Legato articulation l Duration: Long (20-25 sec) l Minor Key l No articulation variability |
|
|
71 100 71 |
l Slow tempo l No Tone attacks l Legato articulation l Soft Timbre |
|
|
100 100 86 |
l Fast tempo l Small tempo variability l Staccato, airy articulation l High sound level l Little sound level variability l Bright timbre l Fast tone attacks l Small timing variations |
|
|
57 |
l Tempo variation l Sound level variability l Tempo variability l Discord |
|
|
57 86 43 |
l Staccato articulation l Duration: short (0.3 – 6 sec) l Large sound level variability l High sound level l Tempo variability |
Note: (A)(B)(C) indicate the players
According to Juslinfs framework, the duration varies
for different intentions. The length of the playing verifies this argument.
Articulation means the staccato or legato
of the piece. As there is a short pause after the staccato, I assume
that the standard deviation, normalized by one second, represents the articulation
level.
Table 5 summarizes the duration and articulation. This result verifies the subjective analysis discussed in 4.2.1
Table 5 Summary of the characteristics of each sample
|
|
Duration (seconds) |
Standard deviation of the sound level (normalized) |
|||
|
Emotion |
Player |
Individual |
Average |
Individual |
Average |
|
Fear |
A |
14 |
14 |
0.0192 |
0.0192 |
|
Sadness |
A |
19 |
23 |
0.0268 |
0.0188 |
|
|
B |
25 |
0.0175 |
||
|
|
C |
25 |
0.0121 |
||
|
Tenderness |
A |
19 |
16 |
0.0159 |
0.0175 |
|
|
B |
11 |
0.0198 |
||
|
|
C |
16 |
0.0167 |
||
|
Happiness |
A |
9.8 |
13 |
0.0247 |
0.0328 |
|
|
B |
7.8 |
0.0278 |
||
|
|
C |
15 |
0.0460 |
||
|
Frustration |
C |
23 |
23 |
0.0230 |
0.0230 |
|
Surprise |
A |
6 |
3.6 |
0.0434 |
0.0713 |
|
|
B |
0.3 |
0.1313 |
||
|
|
C |
4.5 |
0.0825 |
||
Note: Since duration and standard deviation were calculated using the
number of digital samples, the number is not accurate.
The sound level is defined by wav format. Its range is between –1 and
1.
Fear (b) was confused with sadness. The tempo of the sample was slow and had legato articulation. On the other hand, Juslin's framework indicates the opposite: the sample is missing fast tempo and staccato articulation.
Fear (c) was confused with happiness. The
sound level of this sample was high, which might have been misinterpreted as
happiness. Fear is characterized by a low sound level.
Anger
(a) is characterized with a low sound level, which is the expressive key of
sadness. Moreover, anger (a) is a minor key, which might have led to the
confusion.
Anger
(c) is identified as frustration and it has a low sound level and regular tone
attacks. These are not the cues of anger. However, they do not seem to be
significant for the misjudgment. Both are important cues for anger and
frustration. In this case, one could not conclude that the player performed imprecisely,
but these cues are more likely to be judged as frustration.
Since Justin and others have not provided the cues of
frustration, it has not been verified whether these characteristics represent
frustration.
The
main result from this investigation is that the emotions can be conveyed
through purely instrumental short pieces from players to listeners. I used
short anonymous pieces: the longest one lasted 25 seconds. The violinists are
skillful enough to play a wide range of music; however, they are not familiar
with improvising. Even so, they could express their understanding about a word,
and integrated underlying interpretations of short pieces, which general
listeners could capture correctly. This is a real transfer of information.
Moreover,
the expressive cues proposed by Juslin, Bresin and Friberg could be
successfully applied to the case of violins. However, the findings in the
present work also indicate that some emotions are difficult to communicate.
There was confusion on the playersf side: cues were used differently in the
confused cases. However, it cannot be concluded that the players performed
inaccurately, but the communication between the players and the listeners
failed. The failure in the communication might have been due to the selection
of the seven emotions in the present work. Especially in the two dimensional
valence – activity framework, anger and frustration are located close to each
other. This might have led to the confusion both for the players and listeners.
This research can be extended further to the
identification of the dominant cues of each emotion. One possible method would
be to manipulate samples and excerpt one cue using digital filters and test
whether the listeners could still identify the corresponding intentions.
The present work was a preliminary examination of
the expressive cues of frustration. Since there was only one successful sample,
the investigation was limited. The number of successful samples is the key to
knowing the distinctions between frustration and anger.
It would be also interesting to research further
about complex emotions, such as pride, which is defined as the combination of
happiness and anger.
Furthermore, it would be interesting to know how
authentic composers expressed emotions. Especially composers of the romantic
music during the nineteenth century, like Robert Schumann, attempted to
integrate a pure message into their pieces (Plantinga, 1984). Although they do
not easily provide clues to us, it would be intriguing to test their selected pieces.
A possible future application is the musical
recognition as a communication tool. With limited information of short
instrumental pieces, one can convey feelings; similarly with a short message of
the feelings, one can reconstruct gemotionalh music pieces. Similar research
about data transmission using facial information is ongoing
(http://prius.hc.t.u-tokyo.ac.jp/~jikken/index-e.html). The underlying idea is
applicable to future musical communication.
Beran, J. & Mazzola, G.: "Timing Microstructure in Schumann's "Träumerei" as an Expression of Harmony, Rhythm, and Motivic Structure in Music Performance" Computers and Mathematics with Applications 39 (2000) pp.99-130
Bresin, R.; Friberg, A. "Synthesis and decoding of emotionally expressive music performance" Systems, Man, and Cybernetics, 1999. IEEE SMC '99 Conference Proceedings. 1999 IEEE International Conference Volume: 4 , pp. 317 -322 (1999)
Canazza, Sergio; De Poli, Giovannni; Rinaldin, Stefano & Vidolin, Alvise: "Sonological Analysis of Clarinet Expressivity" in Leman, Marc (ed.) "Music Gestalt, and Computing: Studies in Cognitive and Systematic Musicology" Springer (1997)
Davis, Stephen: gPhilosophical Perspectives on Musicfs Expressivenessh in Juslin, Patrik N. and Sloboda, John A. (eds.) gMusic and Emotion: theory and researchh, Oxford University Press (2001)
Juslin, Patrik N.:"Emotioinal Communication in Music Performance: A Functionalist Perspective and Some Data", Music Perception (1997) vol. 14, No. 4, pp.384-418
Juslin, Patrik N.: gCommunicating Emotion in Music Performance: A Review and Theoretical Frameworkh in Juslin, Patrik N. & Sloboda, John A. (eds.) gMusic and Emotion: Theory and Researchh, Oxford University Press (2001)
Plantinga, Leon: gRomantic musich, W.W. Norton & Company, Inc. (1984)
Repp, Bruno H.:"Diversity and commonality in music performance: An analysis of timing microstructure in Schumann's Träumerei" J. Acoust. Soc. Am. 92(5) November 1992 pp. 2546-2568
Questionnaire
Name:
Where
are you from?
How
do you describe your mood today? (happy/sad etc.)
1.
Fear
a.
Situation:
b.
Did you express the situation well?
Not well Not so well Fair Well Very well
1 2 3 4
5
c.
When you composed the phrase did you have any existing musical pieces in mind?
Yes/no
c.
1.If yes, what did you have?
c.
2. Did you refer this piece?
2.
Sadness
a.
Situation:
b.
Did you express the situation well?
Not well Not so well Fair Well Very well
1 2 3 4
5
c.
When you composed the phrase did you have any existing musical pieces in mind?
Yes/no
c.1.If
yes, what did you have?
c.2
Did you refer this piece?
3.
Anger
a.
Situation:
b.
Did you express the situation well?
Not well Not so well Fair Well Very well
1 2 3 4
5
c.
When you composed the phrase did you have any existing musical pieces in mind?
Yes/no
c.1.
If yes, what did you have?
c.2.
Did you refer this piece?
4.
Tenderness
a.
Situation:
b.
Did you express the situation well?
Not well Not so well Fair Well Very well
1 2 3 4
5
c.
When you composed the phrase did you have any existing musical pieces in mind?
Yes/no
c.1.
If yes, what did you have?
c.2.
Did you refer this piece?
5.
Happiness
a.
Situation:
b.
Did you express the situation well?
Not well Not so well Fair Well Very well
1 2 3 4
5
c.
When you composed the phrase did you have any existing musical pieces in mind?
c.1.
If yes, what did you have?
c.2.
Did you refer this piece?
6.
Frustration
a.
Situation:
b.
Did you express the situation well?
Not well Not so well Fair Well Very well
1 2 3 4
5
c.
When you composed the phrase did you have any existing musical pieces in mind?
c.1.
If yes, what did you have?
c.2.
Did you refer this piece?
7.
Surprise
a.
Situation:
b.
Did you express the situation well?
Not well Not so well Fair Well Very well
1 2 3 4
5
c.
When you composed the phrase did you have any existing musical pieces in mind?
c.1.
If yes, what did you have?
c.2.
Did you refer this piece?
General
questions
What
kind of music do you usually listen to?
Specifically,
Composers:
Pieces:
Thank
you so much for cooperation!
Junko
Kimura
2-1.
Waveforms of the successful samples
(1)
Fear

Fear
(A) is characterized by vibratos.
(2)
Sadness

(3)
Tenderness

(4) Happiness
Happiness samples are
characterized by staccatos.
(5) Frustration

The
last part (after fifteen seconds) is discord.
(6)
Surprise

Surprise
samples are characterized by a short passage.
(A) Fear

(2)
Sadness

(3)
Tenderness

(4) Happiness

(5)
Frustration

(6)
Surprise

The
last part of the spectrogram of the frustration sample was a discord which was
characterized by the mixed signals of various frequencies.

On
the other hand, the happiness sample was a concord which was characterized by
the harmony of the fundamental frequency.
