To aggregate or not to aggregate: That is the question

AuthorSearch for: ; Search for: ; Search for:
Proceedings titleKDIR 2011 - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval
ConferenceInternational Conference on Knowledge Discovery and Information Retrieval, KDIR 2011, 26 October 2011 through 29 October 2011, Paris
Pages354357; # of pages: 4
SubjectCentral Limit Theorem; Data preprocessing; Financial Data Analysis; Levy distribution; Stock market; Synthetic data; Agglomeration; Commerce; Data handling; Gaussian distribution; Information retrieval; Aggregates
AbstractConsider a scenario where one aims to learn models from data being characterized by very large fluctuations that are neither attributable to noise nor outliers. This may be the case, for instance, when examining supermarket ketchup sales, predicting earthquakes and when conducting financial data analysis. In such a situation, the standard central limit theorem does not apply, since the associated Gaussian distribution exponentially suppresses large fluctuations. In this paper, we argue that, in many cases, the incorrect assumption leads to misleading and incorrect data mining results. We illustrate this argument against synthetic data, and show some results against stock market data.
Publication date
AffiliationNational Research Council Canada (NRC-CNRC)
Peer reviewedYes
NPARC number21271667
Export citationExport as RIS
Report a correctionReport a correction
Record identifierb92ef2eb-ebf8-4e6c-b7eb-622473e9b04e
Record created2014-03-24
Record modified2016-05-09
Bookmark and share
  • Share this page with Facebook (Opens in a new window)
  • Share this page with Twitter (Opens in a new window)
  • Share this page with Google+ (Opens in a new window)
  • Share this page with Delicious (Opens in a new window)