ViDaR: R-enabled large-scale data analytics in ViDa

Informacje na temat projektu

ViDaR

Identyfikator umowy o grant: 768910

DOI

10.3030/768910

Projekt został zamknięty

Data podpisania przez KE 3 Listopada 2017

Data rozpoczęcia 1 Stycznia 2018

Data zakończenia 30 Czerwca 2019

Finansowanie w ramach

EXCELLENT SCIENCE - European Research Council (ERC)

Koszt całkowity

€ 150 000,00

Wkład UE

€ 150 000,00

150 000,00

Koordynowany przez

ECOLE POLYTECHNIQUE FEDERALE DE LAUSANNE
Switzerland

Periodic Reporting for period 1 - ViDaR (ViDaR: R-enabled large-scale data analytics in ViDa)

Okres sprawozdawczy: 2018-01-01 do 2019-06-30

Data and domain scientists typically rely on data analytics suites such as R, to load, query, explore, and mine massive amounts of data. These tools offer powerful and intuitive graphical user interfaces and APIs to optimize scientific discovery. Although convenient, these tools employ primitive approaches in terms of data management, at the expense of lower performance and scalability.
The main objective of the ViDaR project is to avoid the compromise of lower performance and scalability by combining the results of our ERC Grant VIDa, which provides the technology to explore extremely large and complex datasets by directly accessing the raw data in any form or shape as first-class citizens, and R.

ViDaR offloads existing R code to ViDa by overloading and specializing existing APIs. In addition, using ViDaR, the data scientist does not have to convert and flatten the input files to make them compatible with the R processing model. Instead, they can use the first-class citizen support provided by ViDa for in-situ query processing on raw data. Additionally, the data scientist can benefit from the advancements and techniques for parallel and efficient query execution developed by the database research, as they get integrated into ViDa. Furthermore, ViDaR enables the data scientist to perform analysis on proprietary data formats and formats that are not known to ViDaR during the development of the ViDaR package. This is achieved by extending ViDa to allow dynamically loading input plugins, as libraries, that specify how the corresponding format should be digested by ViDa. This allows a ViDaR user to use a plugin that might not be publicly available, for example due to licensing issues, or a plugin that was not taken into consideration when ViDaR was developed. Lastly, ViDaR can use remote ViDa instances, to allow the user to benefit from a powerful remote cluster, possibly hardware accelerated, while writing code for their desktop. The interaction or the use of the remote server and the hardware acceleration is mostly hidden from the data scientist: they only need to write sequential R code and this is translated to ViDa’s language and then executed on the remote accelerator-enabled servers.

Periodic Reporting for period 1 - ViDaR (ViDaR: R-enabled large-scale data analytics in ViDa)

Udostępnij tę stronę Udostępnij tę stronę w mediach społecznościowych

Pobierz Pobierz zawartość strony