Discovering sequential relationships among time series is important to many application domains. In data mining applications, it is often necessary to search within a series database for time series that matches a pre-specified query series. This primitive is needed, for example, for prediction and clustering purposes. Clustering of time series data contributes to the problem of inducing and forming categories (classes) of events. For example the problem of finding trends, seasons and cycles in a sole time series may be approached by finding similar parts (or, segments) of the series itself. Moreover, the identification of time series coherences could be also approached by the identification of similar ordered sub-sequences between the time series. In a cross-series analysis clustering helps to find indications of one series having an economic impact on a further one. Time series express economic phenomena only by virtue of their numeric content. A phenomenon in terms of an economic variable having an impact on a further one can barely be found on the basis of the textual information attached to series. The information on the impact is contained in significant sequences of the numeric values alone.
The tool developed by Forth help to discover this coherence information buried in numeric data. The main concern of the undertaken work was to specify ways to integrate time series coherences with coherent text-reference collections, both stored in the IRAIA database server. The final prototype (components for time series similarity assessment and clustering) is built in the Java environment in order to achieve high degree of interoperability. The decision made within the IRAIA consortium was that the system is to reside at the server-side of the IRAIA system. A scenario of utilising the MTSD module has as follows:
- The MTSD module runs over the (potentially) recalled time series, and their coherence (similarity with) other time series in the IRAIA database is assessed;
- Experts in the field evaluate the final outcome, and the validated similar time series associations are recorded in the database (the potential of recording clusters of similar/ coherent collections of time series is also possible).