Recent state-of-the-art work on speaker recognition and verification uses a simple factor analysis to derive a low-dimensional "total variability space" which simultaneously captures speaker and channel variability. This approach simplified earlier work using joint factor analysis to separately model speaker and channel differences. Here we adapt this "i-vector" method to image classification by replacing speakers with image categories, voice cuts with images, and cepstral features with SURF local descriptors, and where the role of channel variability is attributed to differences in image backgrounds or lighting conditions. A Universal Gaussian mixture model (UGMM) is trained (unsupervised) on SURF descriptors extracted from a varied and extensive image corpus. Individual images are modeled by additively perturbing the supervector of stacked means of this UGMM by the product of a low-rank total variability matrix (TVM) and a normally distributed hidden random vector, X. The TVM is learned by applying an EM algorithm to maximize the sum of log-likelihoods of descriptors extracted from training images, where the likelihoods are computed with respect to the GMM obtained by perturbing the UGMM means via the TVM as above, and leaving UGMM covariances unchanged. Finally, the low-dimensional i-vector representation of an image is the expected value of the posterior distribution of X conditioned on the image's descriptors, and is computed via straighforward matrix manipulations involving the TVM and image-specific Baum-Welch statistics. We compare classification rates found with (i) i-vectors (ii) PCA (iii) Discriminant Attribute Projection (the last two trained on Gaussian MAP-adapted supervector image representations), and (iv) replacing the TVM with the the matrix of dominant PCA eigenvectors before i-vector extraction.