Government of Canada
Symbol of the Government of Canada

Common menu bar links

The NRC Publications Archive is now operational; however, not all the features of the site are available at this time.

The following features remain unavailable:

  • Viewing/Downloading of full text publications

NRC is currently working to restore these features and we will update this notice as these features become available. Thank you for your patience.


 
 
 
 

To Aggregate or not to aggregate : that is the question

 
 
Affiliation:
NRC Institute for Information Technology
Language:
English
Type:
Conference publication
Conference:
ACM SIGMIS International Conference on Knowledge Discovery and Information Retrieval (KDIR), Paris, France, October 26-29, 2011
Proceedings
Title:
Proceedings
Description:
1 CD-ROM
Date:
2011
NPArC #:
18608249
Keywords:
Data pre-processing; Aggregation; Gaussian distribution; L'evy distribution
Program(s):
3D Imaging, Modeling and Visualization
Group(s):
Visual Information Technology
Abstract:
Consider a scenario where one aims to learn models from data being characterized by very large fluctuations that are neither attributable to noise nor outliers. This may be the case, for instance, when examining supermarket ketchup sales, predicting earthquakes and when conducting financial data analysis. In such a situation, the standard central limit theorem does not apply, since the associated Gaussian distribution exponentiallysuppresses large fluctuations. In this paper, we argue that, in many cases, the incorrect assumption leads to misleading and incorrect data mining results. We illustrate this argument against synthetic data, and show some results against stock market data.
 
Bookmark and Share:
 
 
 
 
Link:
HTML Link: