At an increasing rate, industrial and scientific institutions need to deal with massive data flows, streaming-in from a multitude of sources. Processing, analyzing, and learning from such data often requires immense computational power on, and processing across, several Big Data platforms and/or High-Performance Computing (HPC) infrastructures. What is also required are sophisticated analytics tools, capable of extracting insights on-the-fly, from a multitude of voluminous, correlated, high-velocity data streams. Such tools would allow a data analyst to process data and the extracted insights on it in an interactive manner, with very fast response times to desired analytics tasks. To allow for proactive decision-making, predictive analytics tools, that allow to forecast future events of interest are also required. The performed analysis should be made available to all data analysts, who often do not possess the necessary programming skills to code, optimize and debug data processing operations over Big Data.
At INFORE, we addressed all these challenges through several ambitious objectives. We first designed novel data summarization and approximate query processing techniques, as well as real-time, interactive machine learning and data mining tools, supporting the interactive construction of highly accurate models from extreme-scale data streams and massive data volumes. We also developed novel distributed complex event forecasting techniques, allowing not only for the timely detection of critical events as they occur, but also, forecast their future occurrences. INFORE allows users to easily compose data analytics pipelines though its flexible, pluggable and extendable architecture, supported by corresponding software stacks. This architecture allows non-programmer data analysts to specify processing workflows and data analytics tasks, often with no coding required. This framework consists of a family of data processing operators that can be graphically interconnected to provide a family of complex data processing tasks, while an optimizer module guides all optimization and runtime adaptation decisions. The approach of INFORE was subject to rigorous testing and evaluation, involving controlled experiments and reviews by domain experts, with real life data from the financial, the maritime and the life sciences domains, highlighted by the INFORE22 sea trial.