Improving speaker detection in multi-speaker utterances through automatic purification of training data

David C. Smith, Dan Richman

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

This article is concerned with automatic purification of data used in training statistical models for automatic speaker detection. It is assumed that the available data for training a model for detecting a particular speaker of interest (SOI) is contaminated by utterances from at least one other speaker. Our approach consists of three steps: (1) build a Gaussian mixture model (GMM) for the SOI, training on the contaminated training files; (2) score consecutive segments of these training files with this GMM; (3) build a new purified GMM from highest scoring segments. We apply our method to a set of SOIs from the Switch-Board I corpus (using summed conversation sides), and show that the purified GMMs are significantly more accurate than the contaminated GMMs for detecting the presence of the SOIs in test data known to contain multi-speaker utterances. This evaluation is text-independent, and no assumptions about the identity or relationship of the non-SOIs in the training and testing data are made.

Original languageEnglish
Title of host publicationProceedings of the IASTED International Conference on Circuits, Signals, and Systems
EditorsM.H. Rashid, M.H. Rashid
Pages269-274
Number of pages6
StatePublished - 2003
Externally publishedYes
EventProceedings of the IASTED International Conference on Circuits, Signals and Systems - Cancun, Mexico
Duration: May 19 2003May 21 2003

Publication series

NameProceedings of the IASTED International Conference on Circuits, Signals, and Systems

Conference

ConferenceProceedings of the IASTED International Conference on Circuits, Signals and Systems
Country/TerritoryMexico
CityCancun
Period05/19/0305/21/03

Keywords

  • Gaussian mixture models
  • ROC curves
  • Score normalization
  • Speaker detection
  • Speech processing
  • Training data purification

Fingerprint

Dive into the research topics of 'Improving speaker detection in multi-speaker utterances through automatic purification of training data'. Together they form a unique fingerprint.

Cite this