Periodic Reporting for period 1 - DoSSIER (Domain Specific Systems for Information Extraction and Retrieval)
Reporting period: 2019-11-01 to 2021-10-31
The needs of the casual web searcher are well met today. For general information needs, Wikipedia is an excellent source of knowledge and most search systems today will find a satisfactory link there or elsewhere on the Web. However, for search that is part of a complex, professional task, the web search engines’ responses are generally unsatisfactory. Professional search is defined by the fact that the searchers are undertaking the searches on a professional, paid, basis, as opposed to of their own volition on a voluntary basis out of personal interest . Academic researchers are beginning to put more focus on professional search, as evidenced by the First International Workshop on Professional Search held at SIGIR, the major Information Retrieval conference in 2018. Professional search has the characteristic that it is very domain-specific, with searchers and solution providers specialising in domains such as law, medicine, or patents. Searchers in the patent and legal domains are usually obliged to find complete information for the topic being searched in a limited time – information that is relevant but not found could lead to litigation at a later stage. Medical doctors with difficult and unusual medical cases need to find specific and potentially rare information that allows them to provide optimal treatment to the patients, thereby speeding up the healing and lowering costs to the medical system.
Due to this specialisation of professional search systems by domain, there is a visible and concrete risk that practice develops independently in each domain and thereby inefficiently. DoSSIER (Domain Specific Systems for Information Extraction and Retrieval) acts as a centripetal force, leveraging fundamental theories of information, producing new methods for user observation and system contextualisation, and novel applications for interacting and extracting knowledge from information. It synthesises similarities and differences across domains, tasks, and contexts. This centripetal force creates a unique and much needed training experience for the next generation of information scientists and data practitioners, who will need to be able to work with and between information, users, and systems, understand how to identify specificities of a domain and leverage commonalities between domains.
Even though search is such a key activity, there is often no expertise in an organisation to handle search as part of a coherent information process. Outside of organisations created with a professional search task among their core components (e.g. patent offices), a majority of companies and public offices are struggling with the data they have or could have access to, partly because there is a growing talent mismatch . This is to be addressed in two, complementary ways:
1. creation of new training programmes for the next generation workforce.
2. design and development of applications and methods to support the incumbent workforce adapt to new work processes
DoSSIER comprehensively undertakes these two paths, bringing in cross-domain theoretical models, unique data processing and analytics methods, and concrete, application-based, user studies and use-cases.
DoSSIER will achieve the following objectives:
1. Make a ground-breaking impact on professional search processes and systems
2. Train a new generation of scientifically-principled, creative, entrepreneurial and innovative researchers with the academic and industrial experience necessary to make a significant impact on professional search in Europe, and hence on the European economy.
3. Foster excellence by structuring research and doctoral training to lead to a professional certification of the competences: research knowledge and intellectual abilities; research personal effectiveness; research governance and organization (including ethics and sustainability); and researcher engagement, influence and impact.
· D1.1 - Knowledge Task Survey
· D2.1 - Contextual Search Survey
· D3.1 - State of the Art Models Survey and D3.2 - Models Tutorial
Based on the state-of-the-art, each ESR then identified and refined the research questions to be answered during their ESR project. Some ESR projects have already led to first publications.
The network-wide on-site trainings planned in the DoA were not possible due to the pandemic, therefore we organized instead a sequence of on-line webinars and a workshop:
November 2020: Induction Workshop: Project introduction; Doing a PhD – what does it mean?
January 2021: Basics of IR Evaluations
February 2021: Basics of Human Computer Interaction
March 2021: Doing Science 1
March 2021: Doing Science 2
A highlight was the organisation of a one week face-to-face workshop for ESRs and advisors in Gumpoldskirchen, Austria, in October 2021. This workshop was very successful in fostering collaboration between ESRs. A first hackathon day was held by the ESRs at the workshop, in which they planned software integration to create project demonstrators, for which further hackathon workshops are planned in 2022.
In order to understand and model recurrent patterns and search tasks within domains and across domains, DoSSIER concentrates the work on three complementary domains with intensive professional search activities: healthcare, science & technology innovation, and law. These domains directly address two of the H2020 Societal Challenges (Health, demographic change and wellbeing; and Europe in a changing world).