INEB
INEB
TitleMonitoring incremental histogram distribution for change detection in data streams
Publication TypeBook
Year of Publication2010
AuthorsSebastião, R, Gama, J, Rodrigues, PP, Bernardes, J
Series TitleLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)Lect. Notes Comput. Sci.
Volume5840 LNCS
Number of Pages25 - 42
CityLas Vegas, NV
ISBN Number03029743 (ISSN); 3642125182 (ISBN); 9783642125188 (ISBN)
KeywordsAdaptive Cumulative Windows, Change detection, Data communication systems, Data reduction, Data stream, Data streams, Graphic methods, Learning histograms, Learning systems, Machine learning, Machine-learning, Monitoring, Monitoring data distribution, Sensors, signal detection, Statistical methods
AbstractHistograms are a common technique for density estimation and they have been widely used as a tool in exploratory data analysis. Learning histograms from static and stationary data is a well known topic. Nevertheless, very few works discuss this problem when we have a continuous flow of data generated from dynamic environments. The scope of this paper is to detect changes from high-speed time-changing data streams. To address this problem, we construct histograms able to process examples once at the rate they arrive. The main goal of this work is continuously maintain a histogram consistent with the current status of the nature. We study strategies to detect changes in the distribution generating examples, and adapt the histogram to the most recent data by forgetting outdated data. We use the Partition Incremental Discretization algorithm that was designed to learn histograms from high-speed data streams. We present a method to detect whenever a change in the distribution generating examples occurs. The base idea consists of monitoring distributions from two different time windows: the reference window, reflecting the distribution observed in the past; and the current window which receives the most recent data. The current window is cumulative and can have a fixed or an adaptive step depending on the distance between distributions. We compared both distributions using Kullback-Leibler divergence, defining a threshold for change detection decision based on the asymmetry of this measure. We evaluated our algorithm with controlled artificial data sets and compare the proposed approach with nonparametric tests. We also present results with real word data sets from industrial and medical domains. Those results suggest that an adaptive window's step exhibit high probability in change detection and faster detection rates, with few false positives alarms. © 2010 Springer-Verlag Berlin Heidelberg.
URLhttp://www.scopus.com/inward/record.url?eid=2-s2.0-77957913416&partnerID=40&md5=ec0810650dda3c604d494473a9ffd649