Sequential data are everywhere, from DNA sequences to astronomical light curves, and from aircraft engine monitoring data to the prices of stock options. Recent advances in various fields such as those of data storage, networking and sensing technologies, have allowed organizations to gather overwhelming amounts of sequential data at unprecedented speeds.
This wealth of information enables analysts to identify patterns, find abnormalities, and extract knowledge. It is noteworthy that common practice in various domains is to use custom data analysis solutions, usually built using higher level programming languages, such as R/Python. Such techniques, however, while commonly acceptable in small data processing scenarios, are unfit for larger scale data management and exploration. This is because they come in contrast to all previous database research, not taking advantage of indexes, physical data independence, query optimization, and data processing methods, designed for scalability. In these domains, database systems are used merely for storing and retrieving data and not as the sophisticated query processing systems they are.
Current relational storage layers cannot handle the access patterns that analysts of sequential data are interested in, without scanning large amounts of unnecessary data or without large processing overhead. Thus, making complex analytics inefficient.
In order to exploit this new opportunity, we plan to develop specialized data series storage and retrieval systems, which will allow analysts – across different fields – to efficiently manipulate the sequences of interest.
The proposed research project, named NESTOR (Next gEneration Sequence sTORage), has the potential of great economic and social impact in Europe as multiple scientific and industrial fields are currently in need of the right tools, in order to handle their massive collections of data series.
Fields of science
Call for proposal
See other projects for this call