Service Communautaire d'Information sur la Recherche et le Développement - CORDIS


QRIS Résumé de rapport

Project ID: 40607
Financé au titre de: FP6-MOBILITY
Pays: Greece

Final Activity Report Summary - QRIS (Query pRocessing and Integration for Semistructured data: advancing the frontier)

The amount of information digitally available is increasing. Various studies indicate that the volume of information that is digitally produced and stored doubles every 18 months. A very significant, and increasing, portion of this information is weakly-structured or semi-structured, often in the form of XML, data and is made available by autonomous, widely distributed systems, such as databases, web services, web-based forms and reports, library systems, etc.

The project developed techniques and systems for the efficient processing of such rich, semi-structured information, in a variety of environments. In order to find and retrieve pieces of such information from large databases encoded in the XML format, the project developed appropriate techniques that optimize and execute very efficiently complex requests for such data. Requests are broken down into a sequence of operations and different ways to perform the breakdown are analysed. Efficient algorithms were devised to pick the best sequence of operations and to exploit the available facilities of the underlying data storage system. Equivalent but very different ways to perform the breakdown were discovered, based on the mathematical properties of the operation sequences and of each individual operation, thanks to the development of a set of rewriting rules for sequences of operations.

Finally, different ways (algorithms) to perform the operations were devised and experimentally evaluated. The efficiency of the systems was demonstrated with a full implementation and extensive experimentation. For the increasingly common case where the information is distributed among different, autonomous sources, novel techniques were developed that allow the automatic discovery of plans to retrieve and combine the appropriate pieces of information from each source. The development of the techniques took into account the existence of semantic knowledge about the data in each source, in the form of XML schemas. Moreover, the project developed novel theory for ranked queries over XML data, by developing an abstract scoring algebra for keyword-based filtering operations of XML queries.

The project also developed techniques that enable the integration of such rich semi-structured information, and allow the automatic retrieval and combination of data from multiple autonomous sources of such data. The contents and capabilities of the sources can be defined in an appropriate language that also incorporates information about the schema of the data. The developed techniques analyse the descriptions as well as the use request, and create a plan for the retrieval of the appropriate pieces of information from each source. The techniques are applicable to the automated combination (.i.e., orchestration) of data-centric web services.

Finally, the project made contributions to the theory of peer to peer data management and to the technology of processing sensor information inside a sensor-based peer to peer system. Specifically, the impact of altruism on the participation and level of activity of a peer to peer network was modelled and analysed in detail for the first time. Also, novel techniques were developed that improve the robustness of networks of communicating sensors and increase their ability to process information accurately in the presence of spurious measurements.


