Domain Specific Systems for Information Extraction and Retrieval

Periodic Reporting for period 1 - DoSSIER (Domain Specific Systems for Information Extraction and Retrieval)

Reporting period: 2019-11-01 to 2021-10-31

In today’s information age, the productivity and competitiveness of individuals and firms depends mainly upon their capacity to generate, process, and efficiently use knowledge-based information. To excel in such an economy, we need talented and skilled researchers that can create new methods and models are required to understand the wealth of data in order to produce and subsequently find and retrieve relevant, valuable, and credible information.

The needs of the casual web searcher are well met today. For general information needs, Wikipedia is an excellent source of knowledge and most search systems today will find a satisfactory link there or elsewhere on the Web. However, for search that is part of a complex, professional task, the web search engines’ responses are generally unsatisfactory. Professional search is defined by the fact that the searchers are undertaking the searches on a professional, paid, basis, as opposed to of their own volition on a voluntary basis out of personal interest . Academic researchers are beginning to put more focus on professional search, as evidenced by the First International Workshop on Professional Search held at SIGIR, the major Information Retrieval conference in 2018. Professional search has the characteristic that it is very domain-specific, with searchers and solution providers specialising in domains such as law, medicine, or patents. Searchers in the patent and legal domains are usually obliged to find complete information for the topic being searched in a limited time – information that is relevant but not found could lead to litigation at a later stage. Medical doctors with difficult and unusual medical cases need to find specific and potentially rare information that allows them to provide optimal treatment to the patients, thereby speeding up the healing and lowering costs to the medical system.

Due to this specialisation of professional search systems by domain, there is a visible and concrete risk that practice develops independently in each domain and thereby inefficiently. DoSSIER (Domain Specific Systems for Information Extraction and Retrieval) acts as a centripetal force, leveraging fundamental theories of information, producing new methods for user observation and system contextualisation, and novel applications for interacting and extracting knowledge from information. It synthesises similarities and differences across domains, tasks, and contexts. This centripetal force creates a unique and much needed training experience for the next generation of information scientists and data practitioners, who will need to be able to work with and between information, users, and systems, understand how to identify specificities of a domain and leverage commonalities between domains.

Even though search is such a key activity, there is often no expertise in an organisation to handle search as part of a coherent information process. Outside of organisations created with a professional search task among their core components (e.g. patent offices), a majority of companies and public offices are struggling with the data they have or could have access to, partly because there is a growing talent mismatch . This is to be addressed in two, complementary ways:
1. creation of new training programmes for the next generation workforce.
2. design and development of applications and methods to support the incumbent workforce adapt to new work processes

DoSSIER comprehensively undertakes these two paths, bringing in cross-domain theoretical models, unique data processing and analytics methods, and concrete, application-based, user studies and use-cases.

DoSSIER will achieve the following objectives:
1. Make a ground-breaking impact on professional search processes and systems
2. Train a new generation of scientifically-principled, creative, entrepreneurial and innovative researchers with the academic and industrial experience necessary to make a significant impact on professional search in Europe, and hence on the European economy.
3. Foster excellence by structuring research and doctoral training to lead to a professional certification of the competences: research knowledge and intellectual abilities; research personal effectiveness; research governance and organization (including ethics and sustainability); and researcher engagement, influence and impact.
The fifteen Early Stage Researchers (ESRs) were hired. Each ESR project began by focussing on creating an overview of the state-of-the-art, through the creation of the first scientific deliverables:
· D1.1 - Knowledge Task Survey
· D2.1 - Contextual Search Survey
· D3.1 - State of the Art Models Survey and D3.2 - Models Tutorial

Based on the state-of-the-art, each ESR then identified and refined the research questions to be answered during their ESR project. Some ESR projects have already led to first publications.

The network-wide on-site trainings planned in the DoA were not possible due to the pandemic, therefore we organized instead a sequence of on-line webinars and a workshop:
November 2020: Induction Workshop: Project introduction; Doing a PhD – what does it mean?
January 2021: Basics of IR Evaluations
February 2021: Basics of Human Computer Interaction
March 2021: Doing Science 1
March 2021: Doing Science 2

A highlight was the organisation of a one week face-to-face workshop for ESRs and advisors in Gumpoldskirchen, Austria, in October 2021. This workshop was very successful in fostering collaboration between ESRs. A first hackathon day was held by the ESRs at the workshop, in which they planned software integration to create project demonstrators, for which further hackathon workshops are planned in 2022.
DoSSIER groups its research activities into the three general areas (Applications, Methods, and Models). These feed into each other to generate new hypotheses, identify new experimental procedures, and bring about a better understanding of knowledge and information needs, and the proce sses by which the two interact. The results of this research will provide the vital know-how and tools to the professional search industry.

In order to understand and model recurrent patterns and search tasks within domains and across domains, DoSSIER concentrates the work on three complementary domains with intensive professional search activities: healthcare, science & technology innovation, and law. These domains directly address two of the H2020 Societal Challenges (Health, demographic change and wellbeing; and Europe in a changing world).
