Computationally efficient algorithms which perform speech activity detection have significant potential economic and labor saving benefit, by automating an extremely tedious manual process. In many applications, it is desirable to extract intervals of speech which are bounded by segments of other signal types (fax/modem, music, static, dial tones, etc.). In the past, algorithms which successfully discriminate between speech and one specific other signal type have been developed. Frequently, these algorithms fail when the specific non-speech signal is replaced by a different non-speech signal. Little work has been done on combining such discriminators in order to solve the general speech vs. non-speech discrimination problem. Typically, several signal specific discriminators are blindly combined with predictable negative results. Moreover, when a large number of discriminators are involved, dimension reduction is achieved using Principal Components, which optimally compresses signal variance into the fewest number of dimensions. Unfortunately, these new coordinates are not necessarily optimal for discrimination. In this paper we apply graphical tools to determine a set of discriminators which produce excellent speech vs. non-speech clustering, thereby eliminating the guesswork in selecting good feature vectors. This cluster structure provides a basis for a general multivariate speech vs. non-speech discriminator, which compares very favorably with the TALKATIVE speech extraction algorithm.
|Number of pages||11|
|Journal||Proceedings of SPIE - The International Society for Optical Engineering|
|State||Published - 1998|
|Event||Advance Signal Processing Algorithms, Atchitectures, and Implementations VIII - San diego, CA, United States|
Duration: Jul 22 1998 → Jul 24 1998