To Aggregate or not to aggregate : that is the question
; Paquet, Eric
; Viktor, Herna L.
NRC Institute for Information Technology
ACM SIGMIS International Conference on Knowledge Discovery and Information Retrieval (KDIR), Paris, France, October 26-29, 2011
Data pre-processing; Aggregation; Gaussian distribution; L'evy distribution
3D Imaging, Modeling and Visualization
Visual Information Technology
Consider a scenario where one aims to learn models from data being characterized by very large fluctuations that are neither attributable to noise nor outliers. This may be the case, for instance, when examining supermarket ketchup sales, predicting earthquakes and when conducting financial data analysis. In such a situation, the standard central limit theorem does not apply, since the associated Gaussian distribution exponentiallysuppresses large fluctuations. In this paper, we argue that, in many cases, the incorrect assumption leads to
misleading and incorrect data mining results. We illustrate this argument against synthetic data, and show some results against stock market data.