It is well known that supervised learning problems with &ell;1 (Lasso) and &ell;2 (Tikhonov or Ridge) regularizers will result in very different solutions. Forexample, the &ell;1 solution vector will be sparser and can potentially beused both for prediction and feature selection. However, given a data set it isoften hard to determine which form of regularizationis more applicable in a given context. In this paper we use mathematical propertiesof the two regularization methods followed by detailed experimentation to understand their impact basedon four characteristics: non-stationarity of the data generating process, level of noise in the data sensingmechanism, degree of correlation between dependent and independent variables and the shape of the data set. The practical outcome of our research is that it can serve as a guide forpractitioners of large scale data mining and machine learning tools in their day-to-day practice.