Skip to main content

INtelligeNt ApplicatiOns oVer Large ScAle DaTa StrEams

Periodic Reporting for period 1 - INNOVATE (INtelligeNt ApplicatiOns oVer Large ScAle DaTa StrEams)

Reporting period: 2018-04-01 to 2020-03-31

Predictive analytics is the key research subject for query-driven applications. Analytics offer the necessary basis for intelligent decision making. Due to the huge volumes of data, analytics could be executed on top of data partitions. Each partition contains only a piece of data and a dedicated processor handles the incoming queries. Continuous queries over multiple data partitions require intelligent mechanisms to (1) massively assign queries to distributed data nodes and (2) efficiently aggregate the multipart final query response in limited time with maximum performance (i.e. the Quality of Result - QoR). Query Controllers (QCs) serve the incoming queries realizing the connection of large-scale data systems with the real world. Query Processors (QPs) are placed in front of data partitions realizing a ‘response mechanism’. QCs can have access to multiple QPs and a QP can be ‘connected’ with multiple QCs (i.e. ‘grid’ - ecosystem). INNOVATE introduces an intelligent decision making mechanism in three axes: (i) top-down, by realizing an intelligent mechanism to assign queries to QPs; (ii) bottom-up, by realizing a decision making mechanism to provide an efficient management of the collected data and aggregate responses to applications over partial results; (iii) horizontal, by realizing queries optimization schemes for a ‘swarm’ of QCs.
The objectives of this programme are:
O1. Design & implement Query and QP Models.
O2. Design & implement Learners.
O3. Create a Pool of Learners and Implement an Ensemble Learning Scheme.
O4. Design & implement the Queries Assignment Process.
O5. Design & implement the Multiple Controllers Management Plane.
O6. Develop a holistic approach to research training and career evolvement of the Fellow.
O7. Disseminate and Exploit INNOVATE outcomes.
"INNOVATE delivers a model for the description of: (i) queries; (ii) QCs – entities responsible for the allocation of queries; (iii) QPs – entities placed in nodes (e.g. edge nodes) where the data are collected. We propose the use of specific characteristics for modelling queries and QPs. We adopt characteristics that can be matched each other: The complexity class of a query with the load of QPs; The deadline for execution with the speed of processing; The constraints of a query with the data present in a node. Based on the matching process, we try to reveal the most efficient allocations. We propose models for the definition of the load of each QP, the matching between the constraints and the data present in nodes and the definition of the complexity class of a query.
An additional goal is to create a pool of learners that will become the basis for the definition of an ensemble learning scheme. The adopted learners are: (i) C4.5 decision tree; (ii) Random tree; (iii) Naive Bayes model; (iv) Bayesian Network; (v) Multinomial Naive Bayes model; (vi) Random Forest; (vii) Logistic Model Tree; (viii) REPTree model; (ix) JRip algorithm; (x) Multilayer Perceptron. We build and provide an ensemble learning scheme based on the pool of the adopted learners. Afterwards, we propose a meta-ensemble scheme defined by multiple ensemble learning models, i.e. The AdaBoost model; The Stacking model; The Bagging model. These ensemble schemes are ‘combined’ to define our meta-ensemble learning scheme based on the One-Over-All (OVA) methodology.
We also adopt Fuzzy Logic and propose the use of a Type-2 FL System having as inputs the aforementioned characteristic of queries and nodes/QPs and resulting the so called Efficiency of Allocation (EoA). EoA depicts the certainty (or uncertainty) that an allocation is optimal or not. We enhance the proposed system with an additional input that represents the opinion of an ‘expert’ about the specific allocation, i.e. a Support Vector Machine (SVM) model.
INNOVATE results were depicted by the following publications:
1. Y. Kathidjiotis, K. Kolomvatsos, C. Anagnostopoulos, ‘Predictive intelligence of reliable analytics in distributed computing environments’, Springer Applied Intelligence, 10.1007/s10489-020-01712-5 2020
2. K. Kolomvatsos, C. Anagnostopoulos, ‘A probabilistic Model for Assigning Queries at the Edge’, Springer Computing, 102, 865–892, 2020
3. K. Kolomvatsos, C. Anagnostopoulos, ‘Multi-criteria Optimal Task Allocation at the Edge’, Elsevier Future Generation Computer Systems, 2019
4. K. Kolomvatsos, ‘A Distributed, Proactive Intelligent Scheme for Securing Quality in Large Scale Data Processing’, Springer Computing, 2019.
5. K. Kolomvatsos, ‘An Efficient Scheme for Applying Updates in Pervasive Computing Applications’, Journal of Parallel and Distributed Computing, Elsevier, 128, 2019, pp. 1-14.
6. A. Karanika. P. Oikonomou, K. Kolomvatsos, C. Anagnostopoulos, ‘An Ensemble Interpretable Machine Learning Scheme for Securing Data Quality at the Edge’, in International IFIP CD-MAKE 2020.
7. Karanika, A., Oikonomou, P., Kolomvatsos, K., Loukopoulos, T., ‘A Demand-driven, Proactive Tasks Management Model at the Edge’, IEEE FUZZ-IEEE, 2020.
8. Karanika, A., Soula, M., Anagnostopoulos, C., Kolomvatsos, K., Stamoulis, G., ‘Optimized Analytics Query Allocation at the Edge of the Network’, 12th ICIDCS, Naples, Italy, 2019.
9. E. Aleksandrova, C. Anagnostopoulos, K. Kolomvatsos, ‘Machine Learning Model Updates in Edge Computing: An Optimal Stopping Theory Approach’, 18th IEEE International Symposium on Parallel and Distributed Computing, 2019
11. K. Kolomvatsos, C. Anagnostopoulos, ‘In-Network Edge Intelligence for Optimal Task Allocation’, 30th ICTAI, Volos, Greece, 2018
12. K. Kolomvatsos, C. Anagnostopoulos, ‘An Edge-Centric Ensemble Scheme for Queries Assignment’, 8th International Workshop on Combinations of Intelligent Methods and Applications, Volos, Greece, 2018
13. Ivanov, H., Anagnostopoulos, C., Kolomvatsos, K., ‘In-Network Machine Learning Predictive Analytics: A Swarm Intelligence Approach’, in Convergence of Artificial Intelligence and the Internet of Things, Springer, 2020.
14. K. Kolomvatsos, C. Anagnostopoulos, ‘Edge-Centric Queries Stream Management based on an Ensemble Model’, Springer ""Smart Innovation, Systems and Technologies"" series volume, 2020."
The INNOVATE outcomes can be used by any interested company that administrates a set of distributed data partitions and desire to have an intelligent mechanism for the efficient allocation of queries into the appropriate data partitions. So far, it becomes obvious that INNOVATE outcomes can assist and improve the data management and the provision of analytics in the aforementioned scenarios. Potential end users could be companies offering analytics services on top of large scale data or companies that are active in the edge computing domain offering services to end users or external applications.
INNOVATE could support an analytics platform which will make possible the exploitation of a rich and diverse collection/management of data sources for different ends. Data will comprise information originated in various domains like health, education, transportation, environmental monitoring and so on. INNOVATE will allow the understanding of the most important factors related to the efficient management of queries for supporting efficient end users applications. Real-time insights generated from intelligent data analytics could be the basis for timely, high quality services that will increase end users (thus, society) satisfaction and improve their quality of living.