Introduction



Data & Labeling



Feature Extraction



Models



Application



Results










 

Two different sets of features were extracted for the two different models we have investigated:

Global Features (For GMMs )

 

pitch

loudness 

Pitch

F0 contours of the first ten seconds of voicemail messages were extracted. From the pitch contour we have extracted 5 features :

  • 25th percentile

  • 75th percentile

  • f0 range

  • slope of the regression line through contour

  • total number of local optima

 

Loudness

Spectral perceived loudness distributions were extracted from the speech signal, thanks to code provided by Raul Fernandez. We have extracted 15 features from these loudgrams: 

  • Mean loudness

  • 25th and 75th percentile of perceived loudness

  • 25th and 75th percentile of rms

  • Mean loudness in spectral bins of 4 -14 barks.

Rhythm (Speaking rate)

  • mean duration of voiced segments

  • standard deviation of duration of voiced segments

  • number of voiced segments normalized by message length (10sec or less)

 

 

 

 

 

 

 

Segmental Features (For HMMs)

We have trained Hidden Markov Models with observation sequences, where each observation would reflect the features extracted from an individual voiced segment. Same features above were extracted from 

 

pitch

loudness 

Pitch
  • 25th percentile

  • 75th percentile

  • f0 range

  • slope of the regression line through contour

  • total number of local optima

 

Loudness

  • Mean loudness

  • 25th and 75th percentile of perceived loudness

  • 25th and 75th percentile of rms

  • Mean loudness in spectral bins of 4 -14 barks.

Duration

Length of voiced segment