coming soon

Introduction

Data & Labeling

Feature Extraction

Models

Application

Results

Two different sets of features were extracted for the two different models we have investigated:

Global Features (For GMMs )

pitch

loudness

Pitch

F0 contours of the first ten seconds of voicemail messages were extracted. From the pitch contour we have extracted 5 features :

25th percentile
75th percentile
f0 range
slope of the regression line through contour
total number of local optima

Loudness

Spectral perceived loudness distributions were extracted from the speech signal, thanks to code provided by Raul Fernandez. We have extracted 15 features from these loudgrams:

Mean loudness
25th and 75th percentile of perceived loudness
25th and 75th percentile of rms
Mean loudness in spectral bins of 4 -14 barks.

Rhythm (Speaking rate)

mean duration of voiced segments
standard deviation of duration of voiced segments
number of voiced segments normalized by message length (10sec or less)

Segmental Features (For HMMs)

We have trained Hidden Markov Models with observation sequences, where each observation would reflect the features extracted from an individual voiced segment. Same features above were extracted from

pitch

loudness

Pitch

25th percentile
75th percentile
f0 range
slope of the regression line through contour
total number of local optima

Loudness

Mean loudness
25th and 75th percentile of perceived loudness
25th and 75th percentile of rms
Mean loudness in spectral bins of 4 -14 barks.

Duration

Length of voiced segment