Skip to main content
Vai all'homepage della Commissione europea (si apre in una nuova finestra)
italiano italiano
CORDIS - Risultati della ricerca dell’UE
CORDIS

Domain Specific Systems for Information Extraction and Retrieval

Periodic Reporting for period 2 - DoSSIER (Domain Specific Systems for Information Extraction and Retrieval)

Periodo di rendicontazione: 2021-11-01 al 2024-04-30

In today’s information age, the productivity and competitiveness of individuals and firms depends mainly upon their capacity to generate, process, and efficiently use knowledge-based information. To excel in such an economy, we need talented and skilled researchers that can create new methods and models are required to understand the wealth of data in order to produce and subsequently find and retrieve relevant, valuable, and credible information.

The needs of the casual web searcher are well met today. For general information needs, Wikipedia is an excellent source of knowledge and most search systems today will find a satisfactory link there or elsewhere on the Web. However, for search that is part of a complex, professional task, the web search engines’ responses are generally unsatisfactory. Professional search is defined by the fact that the searchers are undertaking the searches on a professional, paid, basis, as opposed to of their own volition on a voluntary basis out of personal interest. Professional search has the characteristic that it is very domain-specific, with searchers and solution providers specialising in domains such as law, medicine, or patents. Searchers in the patent and legal domains are usually obliged to find complete information for the topic being searched in a limited time – information that is relevant but not found could lead to litigation at a later stage. Medical doctors with difficult and unusual medical cases need to find specific and potentially rare information that allows them to provide optimal treatment to the patients, thereby speeding up the healing and lowering costs to the medical system.

Due to this specialisation of professional search systems by domain, there is a visible and concrete risk that practice develops independently in each domain and thereby inefficiently. DoSSIER (Domain Specific Systems for Information Extraction and Retrieval) acts as a centripetal force, leveraging fundamental theories of information, producing new methods for user observation and system contextualisation, and novel applications for interacting and extracting knowledge from information. It synthesises similarities and differences across domains, tasks, and contexts. This centripetal force creates a unique and much needed training experience for the next generation of information scientists and data practitioners, who will need to be able to work with and between information, users, and systems, understand how to identify specificities of a domain and leverage commonalities between domains.

Even though search is such a key activity, there is often no expertise in an organisation to handle search as part of a coherent information process. Outside of organisations created with a professional search task among their core components (e.g. patent offices), a majority of companies and public offices are struggling with the data they have or could have access to, partly because there is a growing talent mismatch. This is addressed in two, complementary ways:
1. creation of new training programmes for the next generation workforce.
2. design and development of applications and methods to support the incumbent workforce adapt to new work processes

DoSSIER comprehensively undertook these two paths, bringing in cross-domain theoretical models, unique data processing and analytics methods, and concrete, application-based, user studies and use-cases.

DoSSIER achieved the following objectives:
1. Make a ground-breaking impact on professional search processes and systems. Overall, the work in the project over the complete duration led to 76 publications in conferences and journals and 12 code repositories on the DoSSIER project Github page.
2. Train a new generation of scientifically-principled, creative, entrepreneurial and innovative researchers with the academic and industrial experience necessary to make a significant impact on professional search in Europe, and hence on the European economy. Fifteen Early Stage Researchers carried out research towards their doctoral degrees while undergoing a carefully designed training programme including exposure to multiple technical areas, scientific methodology, and entrepreneurship.
3. Foster excellence by structuring research and doctoral training to lead to a professional certification of the competences: research knowledge and intellectual abilities; research personal effectiveness; research governance and organization (including ethics and sustainability); and researcher engagement, influence and impact. All Early Stage Researchers in the DoSSIER project completed the PGCertRPD professional development degree at the University of Strathclyde, thereby certifying the additional competences and skills they obtained.
The fifteen Early Stage Researchers (ESRs) were hired. Each ESR project began by focussing on creating an overview of the state-of-the-art. Based on the state-of-the-art, each ESR then identified and refined the research questions to be answered during their ESR project. Extensive work on the ESR projects as well as collaborations between the ESR projects led to the main scientific outcomes of DoSSIER. The main exploitation of DoSSIER has been through making available open-source software, publicly available data, and prototypes and demonstrations. ESRs were encouraged to make their code available as open source under a permissive open-source licence as far as possible.

The network-wide on-site trainings planned for the first two years of the project were not possible due to the pandemic, therefore we organized instead a sequence of on-line webinars. In the second half of the project, travel became possible again. The following training events were organised:
October 2021: One week face-to-face workshop for ESRs and advisors in Gumpoldskirchen, Austria
December 2021: Cruise Workshop, a hackathon held by the ESRs in Milan, Italy to develop prototypes
July 2022: PatentSemTech workshop, a workshop related to innovation search, in particular patent search. It was co-located with the SIGIR conference in Madrid, Spain
September 2022: DoSSIER internal summer school, held in Olympiada, Greece.
April 2023: Legal Information Retrieval Workshop, co-located with the ECIR conference in Dublin, Ireland
July 2023: The subsequent edition of the PatentSemTech workshop, co-located with the SIGIR conference in Taipei, Taiwan
August 2023: The European Summer School on Information Retrieval organised by the DoSSIER project in Vienna, Austria. 40 DoSSIER-external participants attended.
DoSSIER grouped its research activities into the three general areas (Applications, Methods, and Models). These fed into each other to generate new hypotheses, identify new experimental procedures, and bring about a better understanding of knowledge and information needs, and the processes by which the two interact. The results of this research provide the vital know-how and tools to the professional search industry. In order to understand and model recurrent patterns and search tasks within domains and across domains, DoSSIER concentrated the work on three complementary domains with intensive professional search activities: healthcare, science & technology innovation, and law. These domains directly address two of the H2020 Societal Challenges (Health, demographic change and wellbeing; and Europe in a changing world).
DoSSIER Metaphor
Il mio fascicolo 0 0