Periodic Reporting for period 1 - ADMITTED (Advanced Data Methods for Improved Tiltrotor Test and Design)
Reporting period: 2019-02-01 to 2020-01-31
This can be done with precise analyses of the recorded data, in order to identify possible hazardous behaviours and developing procedures to mitigate the problems before they actually occur.
Because of the enormous amount of data collected during each ight, two are the problems to be faced: define the appropriate infrastructure in terms of HW platform and the most promising software to identify the segments of data that contain useful information. The objective is to extract for a huge amount of data useful information that can support the aircraft manufacturer to better support their work.
Traditional data-mining methods are effective on uniform data sets such as flight tracking data or weather. Integrating heterogeneous data sets introduces complexity in data standardization, normalization, and scalability. The variability of underlying data warehouse can be leveraged using big data infrastructure for scalability to identify trends and create actionable information.
The massive availability of data requires complex and performing architecture to support deep analysis on data. Furthermore, data can be so huge that an intelligent support shall be provided by the platform itself, that is, the end user shall be reinforced and guided in the analysis by means of an intelligent support. This can be only achieved by means of the adoption of novel approach to analyse large amount of data and extract useful information (data mining, machine learning, AI).
The platform as well shall be able to support all analyses without requiring huge investment: this can be achieed by adopting COTS components for the HW platform and state of the art solutions for SW platform: big data management (Hadoop), analytic (Spark R).
Furthermore, the platform and related SW and algorithms shall be exploitable in different contexts as well, for example both in on-premises adoption (as proposed in the current document) and on cloud, according to different exploitable paths. In fact, although for a specific context (NGTCR), both the HW architecture, the SW stack and large part of the algorithms can be exploited also in different contexts and domains (i.e. fixed wings aircraft).
In the scope of WP1 the first important result achieved is the selection, installation and connection of the enabling infrastructure of the project which will be used by the project during the five years. The deliverable is classified as OTHER and composed by the configured cluster itself; this document provides a report about the activities executed and results of the task.
The enabling infrastructure based on a complex, yet flexible HW/SW combination is composed by:
From HW perspective:
• A baremetal infrastructure of 4 nodes optimised for storage and cpu-intensive computations
• A Hyperconvergent Nutatix architecture built upon the beforementioned 4 nodes that is configured to expose the user a 6 nodes cluster capable of holding 80Tb of data.
• A single node powerful workstation equipped with two best-in-class V100 Nvidia GPUs for deep learning applications
From a SW/application perspective:
• A 6 node Cloudera cluster configured with all the Hadoop related services (e.g. Spark, Hive, Kudu, Impala, Hue) needed to run ADMITTED ETL and data analysis jobs
• A standalone workstation equipped and configured with a complete python distribution and all the main Deep Learning and Data Science toolboxes and libraries.
Hardware and software have been physically installed in Leonardo Premises in Cascina Costa in Varese (Italy) under the Implementation Agreement signed by the coordinator and the topic Manager.
The cluster configuration has been finalised by the project coordinator and, together with the Topic Manger, connected to the network and made accessible by a secure connection to data scientists after iterations with the IT department.
A second important result is the creation of the Query Catalogue which realize the definition of data containers, classes and algorithms to be used in the implementation of the most popular queries.
A building block approach have been defined and structured in this deliverable in order to standardize the data access and query builder by data scientists. The advantage of this approach is to provide a common basis to any Topic Manager function which need to extract values from data. Data scientist will be able to re-use the building blocks becoming more efficient without re-inventing the wheel, with high efficacy because building blocks will be bug-proof and providing an easy-to-maintain high level algorithm.
Flight data are collected and injected into the computation cluster after aircraft flight. Data ingestion is done through specific ETL algorithms automatically reading information from downloaded files. The online communication with the aircraft for online data transfer is not part of this proposal due to the missing availability of adequate infrastructure. However, data ingestion algorithms will be implemented in a way that they can support a future online implementation.
Data analysis algorithms
A set of ""traditional"" data analysis algorithms will be provided as basic catalogue (data filtering, FFT, correlation, …). Beyond them a set of algorithms are defined, specifically for flight data analysis, operating on statistics and machine learning. The adoption of a powerful HW platform will make practical the analysis of huge amount of data. Data correlation and comparison among flights from different aircrafts are possible only if supported properly by the underling platform and algorithms that know how to exploit it.
1.3.3 Aircraft design phase & Flight campaign optimization
With current implementations only a partial comparison is possible, only few set of parameters can be combined during analysis. This may lead to erroneous considerations or take a considerable amount of time before to obtain the desired values.
Contribution beyond state of the art. Not only the proposed implementation will be able to manage large amount of data, as stated in previous topics, but also will provide a considerable help in the understanding of data thanks to the adoption of machine learning techniques. When properly trained, it can support the data scientist in better understanding the characteristics of the new aircraft (NGCTR).The result of flight data of previous developed aircraft can be used to support the design of new aircraft (such as the Next-Generation Civil Tiltrotor). In addition, the use of engineering data of the aircraft under design can be compared with flying data."