McDiarmid drift detection methods for evolving data streams

Download
  1. (PDF, 645 KB)
  2. Get@NRC: McDiarmid drift detection methods for evolving data streams (Opens in a new window)
AuthorSearch for: ; Search for: ; Search for:
TypeArticle
Journal titleStatistics
Article numberarXiv:1710.02030
Pages# of pages: 12
AbstractIncreasingly, Internet of Things (IoT) domains, such as sensor networks, smart cities, and social networks, generate vast amounts of data. Such data are not only unbounded and rapidly evolving. Rather, the content thereof dynamically evolves over time, often in unforeseen ways. These variations are due to so-called concept drifts, caused by changes in the underlying data generation mechanisms. In a classification setting, concept drift causes the previously learned models to become inaccurate, unsafe and even unusable. Accordingly, concept drifts need to be detected, and handled, as soon as possible. In medical applications and military zones, for example, change in behaviors should be detected in near real-time, to avoid potential loss of life. To this end, we introduce the McDiarmid Drift Detection Method (MDDM), which utilizes McDiarmid's inequality in order to detect concept drift. The MDDM approach proceeds by sliding a window over prediction results, and associate window entries with weights. Higher weights are assigned to the most recent entries, in order to emphasize their importance. As instances are processed, the detection algorithm compares a weighted mean of elements inside the sliding window with the maximum weighted mean observed so far. A significant difference between the two weighted means, upper-bounded by the McDiarmid inequality, implies a concept drift. Our extensive experimentation against synthetic and real-world data streams show that our novel method outperforms the state-of-the-art. Specifically, MDDM yields shorter detection delays as well as lower false negative rates, while maintaining high classification accuracies.
Publication date
PublisherCornell University Library
Linkhttps://arxiv.org/abs/1710.02030
LanguageEnglish
AffiliationDigital Technologies; National Research Council Canada
Peer reviewedYes
NPARC number23002482
Export citationExport as RIS
Report a correctionReport a correction
Record identifier36517237-8c63-4c9f-92ea-fac757323772
Record created2017-11-15
Record modified2017-11-15
Bookmark and share
  • Share this page with Facebook (Opens in a new window)
  • Share this page with Twitter (Opens in a new window)
  • Share this page with Google+ (Opens in a new window)
  • Share this page with Delicious (Opens in a new window)
Date modified: