Outline of the Approach
In the DFG (Deutsche Forschungsgemeinschaft) project Kl 648/1 we were concerned about the
question of how to learn rules about qualitative / semi-quantitative
dependencies in multivariate time series (keywords: knowledge
discovery from time series, rule discovery in time series, temporal
pattern discovery). The approach can be sketched as follows: At the
beginning the time series are segmented and thereby transformed into
sequences of labeled intervals. The labels denote qualitative aspects
of the signal in the respective intervals. Then, from the sequence of
labeled intervals, we discover temporal patterns that occur more often
than a certain threshold. The temporal patterns are sets of intervals
where Allen's interval logic is used to capture their temporal
relationships. From these patterns rules can be derived with temporal
patterns in the premise and conclusion. Rules may be specialized with
respect to numerical attributes like the length of the intervals or
the slope of the signal within the interval. Finally, we obtain rules
like when signal A decreases while signal B increases with slope
greater than 2 then signal C will decrease. Since humans use a
similar syntax when discussing such aspects, the proposed methodology
may support a human in learning from temporal data. New insights into
the examined time series may then be the motivation for extracting
different features from the time series and the process is restarted.
The approach can also be applied to sequential data other than time
series (for instance biological sequences, medical profiles,
etc.). For sequential learning the time series abstraction step has to
be replaced by an appropriate procedure that yields a sequence of
intervals. Very often no such method is necessary because the data is
already in this format (e.g. deseases of a patient, insurance
contracts, period in which a certain DNA sequence occurs, etc.).
The figure above depicts the approach graphically. The arrows
indicate the processing steps necessary to reach the next
representation. Terms with question marks indicate important points
that need special attention in the respective step.
The approach utilizes techniques from artificial intelligence,
machine learning, data mining and signal processing. This project was
funded by the DFG (Deutsche Forschungsgemeinschaft) under grant Kl-648.
|
To probe further...
You can find my publications related to temporal patterns below,
some of them are available on-line (g'zipped postscript (.ps.gz) and
portable document format (.pdf)).
- Overview: For a brief overview of the approach, see
- F. Höppner:
Learning Dependencies in Multivariate Time Series.
Proc. of the ECAI'02 Workshop on Knowledge Discovery in (Spatio-)
Temporal Data, Lyon, France, pp. 25-31, July 2002.
[ .ps.gz ]
[ .pdf ]
- F. Höppner:
Lernen lokaler Zusammenhänge in multivariaten Zeitreihen.
Tagungsband zum 5. Göttinger Symposium Soft Computing,
Göttingen, pp. 113-125, Juni 2002.
[ .ps.gz ]
[ .pdf ]
- Pattern Space: To capture the interval relationships we use
Allen's interval logic. A pattern thus consists of a set of intervals,
their labels, and their interval relationships (like before, meets,
overlaps, etc.) Sometimes, the true patterns in the data cannot be
represented by a single pattern of our pattern space, for instance ``B
starts some time after A'' may manifest in pattern ``A before B'', ''A
meets B'' or even ``A overlaps B''. This paper describes an approach
to overcome such difficulties:
Feature Selection: Which labels do we want to consider?
Since humans are used in hierarchically refining contexts, we start
with increasing/decreasing or concave/convex labels, which are then
(qualitatively or quantitatively) refined during the process. For
quantitative constraints see
Noise Handling: How do we want to distinguish between
(possibly non-Gaussian) noise and features of the observed system
during time series abstraction? We use scale-space filtering and
scale-space lifetime to extract robust and perceptually important
features.
Feature Ambiguity: It is often not a priori clear which
aspects of a time series (at which scale) are of interest for the
patterns we want to discover. Therefore, we use a multiscale
description to reflect the ambiguity in the labels (a decreasing
segment may become an increasing segment if we zoom
out).
Efficiency: Techniques from association rule mining are
adopted to find all patterns that occur more often than a certain
threshold. A number of pruning techniques is used to make the process
as efficient as possible. See the following references (the last one
is the most detailed one): - F. Höppner:
Learning Temporal Rules from State Sequences.
IJCAI Workshop on Learning from Temporal and Spatial Data,
Seattle, USA, pp. 25-31, 2001.
[ .ps.gz ]
[ .pdf ]
- F. Höppner:
Discovery of Temporal Patterns - Learning Rules about the Qualitative
Behaviour of Time Series.
Proc. of the 5th European Conference on Principles and Practice of Knowledge
Discovery in Databases, Lecture Notes in Artificial Intelligence 2168,
Springer. Freiburg, Germany, pp. 192-203, Sept. 2001.
[ .ps.gz ]
[ .pdf ]
© Springer
- F. Höppner, F. Klawonn:
Learning Rules about the Development of Variables over Time.
In: C.T. Leondes (editor): Intelligent Systems -
Techniques and Applications, vol IV, CRC Press, 201-228, 2002.
Generalization: Some kinds of patterns cannot be expressed
by single elements of our pattern space. In this case, we can try to
find a set of elements that approximate the true relationship in the
data approximately. A possible approach is described in
|