Data & Labeling
|
Sources of
training data:
The first ten seconds of voicemail messages were used for labeling and training. This is based on studies which claim that humans make their decisions on whether to skip or keep listening after hearing "the first few seconds" of each message. ( All Talk and All Action. Whittaker, Hirschberg et al. ) Emotionally significant segments were taken out of the Call Home and Oasis Corpus. They were very good sources for formal vs. informal. 361 messages were labeled by both of us independently. Labels were then compared and only those messages with labels that were in agreement were used as training data. |