Learning with aggregation and correlation in the presence of large fluctuations

  1. (PDF, 397 KB)
AuthorSearch for: ; Search for: ; Search for:
Proceedings titleProceedings of New Frontiers in Mining Complex Patterns (NFMCP 2012)
ConferenceNFMCP 2012 : International Workshop on New Frontiers in Mining Complex Patterns in conjunction with the ECML/PKDD-2012 (European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases), 24-28 September 2012, Bristol, UK
Pages1223; # of pages: 12
Subjectaggregation in relational learning; correlation-based analysis and covariance; lévy distribution; stable distribution
AbstractConsider a scenario where one aims to learn models from dynamic and evolving data being characterized by very large fluctuations that are neither attributable to noise nor outliers. This may be the case, for instance, when predicting the potential future damages of earthquakes or oil spills, or when conducting financial data analysis. If follows that, in such a situation, the standard central limit theorem does not apply, since the associated Gaussian distribution exponentially suppresses large fluctuations. In this paper, we present an analysis of data aggregation and correlation in such scenarios. To this end, we introduce the Lévy, or stable, distribution which is a generalization of the Gaussian distribution. Our theoretical conclusions are illustrated with various simulations, as well as against a benchmarking financial database. We show which specific strategies should be adopted for aggregation, depending on the stability exponent of the Lévy distribution. Our results firstly show scenarios where it may be impossible to determine the mean and the standard deviation of an aggregate. Secondly, we discuss the case where an aggregate may have to be characterized with its largest fluctuations. Thirdly, we illustrate that the correlation in between two attributes may be underestimated if a Gaussian distribution is erroneously assumed.
Publication date
AffiliationInformation and Communication Technologies; National Research Council Canada
Peer reviewedYes
NPARC number21257781
Export citationExport as RIS
Report a correctionReport a correction
Record identifierb9860f27-54db-41a7-a16f-662aab2c6528
Record created2013-02-28
Record modified2016-05-09
Bookmark and share
  • Share this page with Facebook (Opens in a new window)
  • Share this page with Twitter (Opens in a new window)
  • Share this page with Google+ (Opens in a new window)
  • Share this page with Delicious (Opens in a new window)