TY - GEN

T1 - An incremental data-stream sketch using sparse random proj ections

AU - Menon, Aditya Krishna

AU - Pham, Gia Vinh Anh

AU - Chawla, Sanjay

AU - Viglas, Anastasios

PY - 2007

Y1 - 2007

N2 - We propose the use of random projections with a sparse matrix to maintain a sketch of a collection of high-dimensional data-streams that are updated asynchronously. This sketch allows us to estimate L2 (Euclidean) distances and dot- products with high accuracy. We verify the validity of this sketch by applying it to an online clustering problem, where we compare our results to the offline algorithm and an existing L2 sketch, and observe comparable results in terms of accuracy, and a reduced runtime cost.

AB - We propose the use of random projections with a sparse matrix to maintain a sketch of a collection of high-dimensional data-streams that are updated asynchronously. This sketch allows us to estimate L2 (Euclidean) distances and dot- products with high accuracy. We verify the validity of this sketch by applying it to an online clustering problem, where we compare our results to the offline algorithm and an existing L2 sketch, and observe comparable results in terms of accuracy, and a reduced runtime cost.

UR - http://www.scopus.com/inward/record.url?scp=70449094532&partnerID=8YFLogxK

U2 - 10.1137/1.9781611972771.62

DO - 10.1137/1.9781611972771.62

M3 - Conference contribution

AN - SCOPUS:70449094532

SN - 9780898716306

T3 - Proceedings of the 7th SIAM International Conference on Data Mining

SP - 563

EP - 568

BT - Proceedings of the 7th SIAM International Conference on Data Mining

PB - Society for Industrial and Applied Mathematics Publications

Y2 - 26 April 2007 through 28 April 2007

ER -