Learning in the presence of large fluctuations : a study of aggregation and correlation

  1. Get@NRC: Learning in the presence of large fluctuations : a study of aggregation and correlation (Opens in a new window)
DOIResolve DOI: http://doi.org/10.1007/978-3-642-37382-4_4
AuthorSearch for: ; Search for: ; Search for:
TypeBook Chapter
Proceedings titleNew Frontiers in Mining Complex Patterns : First International Workshop, NFMCP 2012, Held in Conjunction with ECML/PKDD 2012, Bristol, UK, September 24, 2012, Rivesed Selected Papers
Series titleLecture Notes In Computer Science; Volume 7765
Conference1st International Workshop on New Frontiers in Mining Complex Patterns (NFMCP 2012), held in conjunction with European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2012), September 24, 2012, Bristol, United Kingdom
Volume7765 LNAI
Pages4963; # of pages: 15
SubjectAnalysis of data; Average values; Central Limit Theorem; Correlation-based Analysis and Covariance; Financial Data Analysis; Relational learning; Set of rules; Stable distributions; Database systems; Oil spills; Gaussian distribution
AbstractConsider a scenario where one aims to learn models from data being characterized by very large fluctuations that are neither attributable to noise nor outliers. This may be the case, for instance, when predicting the potential future damages of earthquakes or oil spills, or when conducting financial data analysis. If follows that, in such a situation, the standard central limit theorem does not apply, since the associated Gaussian distribution exponentially suppresses large fluctuations. In this paper, we present an analysis of data aggregation and correlation in such scenarios. To this end, we introduce the Lévy, or stable, distribution which is a generalization of the Gaussian distribution. Our theoretical conclusions are illustrated with various simulations, as well as against a benchmarking financial database. We show which specific strategies should be adopted for aggregation, depending on the stability exponent of the Lévy distribution. Our results indicate that the correlation in between two attributes may be underestimated if a Gaussian distribution is erroneously assumed. Secondly, we show that, in the scenario where we aim to learn a set of rules to estimate the level of stability of a stock market, the Lévy distribution produces superior results. Thirdly, we illustrate that, in a multi-relational database mining setting, aggregation using average values may be highly unsuitable. © 2013 Springer-Verlag.
Publication date
PublisherSpringer Berlin Heidelberg
AffiliationInformation and Communication Technologies; National Research Council Canada
Peer reviewedYes
NPARC number21270707
Export citationExport as RIS
Report a correctionReport a correction
Record identifier9fed2d6f-2322-4574-badd-32de49521bcb
Record created2014-02-17
Record modified2016-06-29
Bookmark and share
  • Share this page with Facebook (Opens in a new window)
  • Share this page with Twitter (Opens in a new window)
  • Share this page with Google+ (Opens in a new window)
  • Share this page with Delicious (Opens in a new window)