Periodic Reporting for period 2 - DoSSIER (Domain Specific Systems for Information Extraction and Retrieval)
Periodo di rendicontazione: 2021-11-01 al 2024-04-30
The needs of the casual web searcher are well met today. For general information needs, Wikipedia is an excellent source of knowledge and most search systems today will find a satisfactory link there or elsewhere on the Web. However, for search that is part of a complex, professional task, the web search engines’ responses are generally unsatisfactory. Professional search is defined by the fact that the searchers are undertaking the searches on a professional, paid, basis, as opposed to of their own volition on a voluntary basis out of personal interest. Professional search has the characteristic that it is very domain-specific, with searchers and solution providers specialising in domains such as law, medicine, or patents. Searchers in the patent and legal domains are usually obliged to find complete information for the topic being searched in a limited time – information that is relevant but not found could lead to litigation at a later stage. Medical doctors with difficult and unusual medical cases need to find specific and potentially rare information that allows them to provide optimal treatment to the patients, thereby speeding up the healing and lowering costs to the medical system.
Due to this specialisation of professional search systems by domain, there is a visible and concrete risk that practice develops independently in each domain and thereby inefficiently. DoSSIER (Domain Specific Systems for Information Extraction and Retrieval) acts as a centripetal force, leveraging fundamental theories of information, producing new methods for user observation and system contextualisation, and novel applications for interacting and extracting knowledge from information. It synthesises similarities and differences across domains, tasks, and contexts. This centripetal force creates a unique and much needed training experience for the next generation of information scientists and data practitioners, who will need to be able to work with and between information, users, and systems, understand how to identify specificities of a domain and leverage commonalities between domains.
Even though search is such a key activity, there is often no expertise in an organisation to handle search as part of a coherent information process. Outside of organisations created with a professional search task among their core components (e.g. patent offices), a majority of companies and public offices are struggling with the data they have or could have access to, partly because there is a growing talent mismatch. This is addressed in two, complementary ways:
1. creation of new training programmes for the next generation workforce.
2. design and development of applications and methods to support the incumbent workforce adapt to new work processes
DoSSIER comprehensively undertook these two paths, bringing in cross-domain theoretical models, unique data processing and analytics methods, and concrete, application-based, user studies and use-cases.
DoSSIER achieved the following objectives:
1. Make a ground-breaking impact on professional search processes and systems. Overall, the work in the project over the complete duration led to 76 publications in conferences and journals and 12 code repositories on the DoSSIER project Github page.
2. Train a new generation of scientifically-principled, creative, entrepreneurial and innovative researchers with the academic and industrial experience necessary to make a significant impact on professional search in Europe, and hence on the European economy. Fifteen Early Stage Researchers carried out research towards their doctoral degrees while undergoing a carefully designed training programme including exposure to multiple technical areas, scientific methodology, and entrepreneurship.
3. Foster excellence by structuring research and doctoral training to lead to a professional certification of the competences: research knowledge and intellectual abilities; research personal effectiveness; research governance and organization (including ethics and sustainability); and researcher engagement, influence and impact. All Early Stage Researchers in the DoSSIER project completed the PGCertRPD professional development degree at the University of Strathclyde, thereby certifying the additional competences and skills they obtained.
The network-wide on-site trainings planned for the first two years of the project were not possible due to the pandemic, therefore we organized instead a sequence of on-line webinars. In the second half of the project, travel became possible again. The following training events were organised:
October 2021: One week face-to-face workshop for ESRs and advisors in Gumpoldskirchen, Austria
December 2021: Cruise Workshop, a hackathon held by the ESRs in Milan, Italy to develop prototypes
July 2022: PatentSemTech workshop, a workshop related to innovation search, in particular patent search. It was co-located with the SIGIR conference in Madrid, Spain
September 2022: DoSSIER internal summer school, held in Olympiada, Greece.
April 2023: Legal Information Retrieval Workshop, co-located with the ECIR conference in Dublin, Ireland
July 2023: The subsequent edition of the PatentSemTech workshop, co-located with the SIGIR conference in Taipei, Taiwan
August 2023: The European Summer School on Information Retrieval organised by the DoSSIER project in Vienna, Austria. 40 DoSSIER-external participants attended.