I-vectors for image classification

David C. Smith

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Recent state-of-the-art work on speaker recognition and verification uses a simple factor analysis to derive a low-dimensional "total variability space" which simultaneously captures speaker and channel variability. This approach simplified earlier work using joint factor analysis to separately model speaker and channel differences. Here we adapt this "i-vector" method to image classification by replacing speakers with image categories, voice cuts with images, and cepstral features with SURF local descriptors, and where the role of channel variability is attributed to differences in image backgrounds or lighting conditions. A Universal Gaussian mixture model (UGMM) is trained (unsupervised) on SURF descriptors extracted from a varied and extensive image corpus. Individual images are modeled by additively perturbing the supervector of stacked means of this UGMM by the product of a low-rank total variability matrix (TVM) and a normally distributed hidden random vector, X. The TVM is learned by applying an EM algorithm to maximize the sum of log-likelihoods of descriptors extracted from training images, where the likelihoods are computed with respect to the GMM obtained by perturbing the UGMM means via the TVM as above, and leaving UGMM covariances unchanged. Finally, the low-dimensional i-vector representation of an image is the expected value of the posterior distribution of X conditioned on the image's descriptors, and is computed via straighforward matrix manipulations involving the TVM and image-specific Baum-Welch statistics. We compare classification rates found with (i) i-vectors (ii) PCA (iii) Discriminant Attribute Projection (the last two trained on Gaussian MAP-adapted supervector image representations), and (iv) replacing the TVM with the the matrix of dominant PCA eigenvectors before i-vector extraction.

Original languageEnglish
Title of host publicationApplications of Digital Image Processing XXXVII
EditorsAndrew G. Tescher
PublisherSPIE
ISBN (Electronic)9781628412444
DOIs
StatePublished - 2014
Externally publishedYes
EventApplications of Digital Image Processing XXXVII - San Diego, United States
Duration: Aug 18 2014Aug 21 2014

Publication series

NameProceedings of SPIE - The International Society for Optical Engineering
Volume9217
ISSN (Print)0277-786X
ISSN (Electronic)1996-756X

Conference

ConferenceApplications of Digital Image Processing XXXVII
Country/TerritoryUnited States
CitySan Diego
Period08/18/1408/21/14

Keywords

  • Baum-welch statistics
  • Dimension reduction
  • Factor analysis
  • Fisher vectors
  • Gaussian MAP-adapted supervectors
  • Gaussian mixture model
  • I-vectors
  • Image classification
  • Total variability matrix

Fingerprint

Dive into the research topics of 'I-vectors for image classification'. Together they form a unique fingerprint.

Cite this