Final Report Summary - ERACEP (Emerging research areas and their coverage by ERC-supported projects)
Accordingly, the ERACEP approach is based on the combination of two perspectives. The first perspective concerns the identification of emerging research areas. The second perspective takes the view of ERC funding and explores how ERC-funded research themes map to the identified emerging areas. For the translation of these two perspectives into a research agenda, new methodological approaches had to be developed.
For detecting emerging research, a bibliometric approach was developed consisting of two building blocks. The first building block comprises the identification of dynamic research fields based on publication activities using the Sharpe ratio as growth measure. The second building block designed for the detection of emerging topics within dynamic fields consists of three bibliometric components: a cluster analysis on the fields using a hybrid approach combining bibliographic coupling and textual links, a fine-grained representation of clusters based on core documents, and a diachronic analysis of the evolution of links among clusters and topics over time.
Results of the bibliometric analyses were validated via expert interviews.
The fine-mapping methodology was applied to six selected dynamic fields and resulted in the detection of ten emerging topics: brain-computer interface, kinematics, biodiesel, fuel cells, radiation, nanopollution, prenatal diagnosis, environmental health issues, acquired immunodeficiency syndrome (AIDS) in Africa, state and region.
Looking at international activities in these topics we observe substantial differences with respect to national contributions and international collaborations amongst the distinct dynamic fields and emerging topics.
For mapping ERC-funded research to the emerging research topics a full-text-matching approach was developed which enables the mapping of grant applications to the data set of publications specifying the identified thematic clusters. This matching methodology has several advantages as the text-based link can be quantified and calculated as soon as an application is submitted. It is using the full body of the text and does not involve passing of the text to extract references that have to be matched with individual papers or journals. The results of our matching approach, which was carried out for a selected number of thematic fields and emerging topics, indicate that ERC funding indeed is able to address emerging topics.
The ability to identify emerging research topics can be used by the ERC for assisting internal discussions about the structure and scope of different evaluation panels and for supporting the identification of appropriate panel members with the right expertise in emerging topics. The matching approach of ERACEP can be used in an ex ante and in an ex post way during the evaluation procedures. In the ex ante mode the mapping of applications to distinct topics could facilitate the assignment of applications to evaluation panels after submission. Applications could also be matched with a set of key papers which in turn would allow to better identify the most appropriate external experts which might be in-cluded in the evaluation procedures. Finally, a pre-selection of applications would be possible in a sense that applications can be mapped to thematic clusters labelled as 'emerging', 'established' or even 'vanishing'. Ex post results of the mapping exercise can be used for a reflexive assessment of ERC procedures.
Project context and objectives:
The main objectives of ERACEP are the identification of topically emerging research areas and the analysis to what extent activities supported by the ERC cover and contribute to these research areas. The ERACEP approach is based on the combination of two perspectives. The first perspective concerns the identification of topically emerging research areas, assuming that excellent truly creative research is particularly concerned with such emerging areas. The second perspective takes the view of ERC funding and explores how ERC-funded research themes map to the identified emerging areas.
For the translation of these two perspectives into a research agenda, new methodological approaches had to be developed. For detecting emerging research topics, a number of mainly expert-based foresight methods have been suggested and applied in different contexts. Frequently, standard future topics are identified by such methods. ERACEP adopted a bibliometric approach for identifying emerging topics starting with an open investigation in all fields of science, social sciences, arts and humanities. A bibliometric approach has the specific advantage that it refers to objective data in contrast to expert-based approaches which inevitably imply some subjectivity. The open investigation implemented within ERACEP reflects as much as possible the open, flexible, bottom-up approach of the ERC funding schemes.
Also the implementation of the second perspective of ERACEP, the matching of ERC-funded activities to emerging research topics, required the development of new methods since no suitable and validated methodology was available at the start of ERACEP.
Accordingly, due to the inherent nature of its main objectives, ERACEP has an explorative and experimental character. This also implies that during the advancement of research within ERACEP the work plan had to be adapted to accommodate for new empirical findings. An important modification in this line was the decision to narrow-down the scope of the analysis by focusing more in-depth on the evolution of certain clusters of emerging topics instead of a broad mapping distributed over all areas of science.
The identification of emerging research topics
The bibliometric approach for the detection of emerging research topics developed within ERACEP consists of two building blocks. The first building block comprises the identification of dynamic research fields based on publication activities. The rationale behind this approach is the notion that dynamic growth - in terms of publications - of a specific field reflects on the one hand an increasing interest of scientists in this field, so that more research groups are doing research in the respective area. On the other hand, dynamics could also indicate that a constant number of scientists are increasing their research activities. The second building block designed for the detection of emerging topics within dynamic fields consists of three bibliometric components: a cluster analysis on the fields, a fine-grained representation of clusters based on core documents, and a diachronic analysis of the evolution of links amongst clusters and topics over time.
For the identification of the (most) dynamic fields, a Sharpe ratio was calculated for all the subject categories included in the Science Citation Index Expanded, the Social Sciences Citation Index, and the Arts and Humanities Citation Index of Thomson Reuters. The Sharpe ratio considers the development of a specific field in relation to the growth of all fields. Thereby, it adjusts for differences in absolute field size. As a result of this dynamic analysis thirteen fields were identified from the sciences, four from the social sciences, and three from the arts and humanities. Amongst the thirteen most dynamic fields from the sciences, five fields can be categorised within the area of medicine. These include nursing, orthopaedics, obstetrics and gynaecology, medical ethics, and oncology. While orthopaedics, obstetrics, gynaecology, and oncology are closely related and concerned with the clinical and therapeutical dimension of medicine, nursing and medical ethics represent healthcare and social dimension. Three of the dynamic fields belong to the engineering domain. They cover rather small and specialised fields like robotics and characterisation and testing of materials as well as the larger field of biomedical engineering. Two fields represent the biological sciences. The first one concerns biotechnology and applied microbiology, the second one deals with behavioural sciences. The three remaining fields from the sciences can be considered as stand-alone fields. They include operations research and management sciences, environmental sciences, and energy and fuels. Within the four most dynamic fields of the social sciences two also belong to the medicine domain. These are public, environmental and occupational health, and experimental psychology. The two remaining fields are transportation and geography. Within the arts and humanities three quite different dynamic fields have been detected: archaeology, religion, and architecture.
Overall, the dynamic analysis using the Sharpe ratio proved to be a powerful indicator of growth and an appropriate criterion for the selection of the most dynamic fields, resulting in a portfolio of quite diverse dynamic fields covering all categories of science.
For the first component of the detection of the emerging topics within the retained sample of dynamic fields, the cluster analysis, a hybrid approach was applied which forms a combination of two traditional cluster techniques: a citation-based approach and an approach based on textual links. The citation-based component uses bibliographic coupling, which has clear advantages compared to co-citation analyses, in particular for the identification of emerging topics. The most important advantage is that all papers have references which can be used for bibliographic coupling and that no response time is needed for citing literature which is crucial for a citation-based approach. The textual component is based on term frequencies, where terms extracted from titles, abstracts, and publication keywords are analysed. Both components are combined to a hybrid measure of similarity between documents which forms the basis for clustering. For the labelling and representation of clusters, core documents are used. These are defined as documents which are strongly linked - i.e. referred to - by many other documents, and thus represent the most interconnected part of the network, thereby also representing the main (or central) content of the network. The third component of the methodology comprised a diachronic analysis of the links between clusters over time. Cluster analyses were carried out for two not overlapping time periods. To determine links between the structures of the two periods, citation links between core documents in one period i.e. 2004-08 to all publications in the clusters of the other period i.e. 1999 - 2003 were used.
By this fine-mapping analysis, three cases of cluster evolution were identified which can indicate new emerging topics. The first case is existing clusters with an exceptional growth over time. The second case is completely new clusters with roots in other clusters of a previous period. The third type of emerging topics represents existing clusters with a topic shift.
The fine-mapping methodology was applied to 6 selected dynamic fields and resulted in the detection of 10 emerging topics. Within the dynamic field of biomedical engineering, two emerging topics are detected. The first topic is labelled 'Brain-computer interface' and represents the first type of emerging topics, namely a topic which is already present in the first period but is characterised by an extensive growth over time. The second topic is a completely new topic labelled 'Kinematics'. It emerged with links to the clusters imaging, cartilage and bone cement and is concerned mainly with joint and muscle kinematics during motion and corresponding models.
Within the dynamic field of energy and fuels, again two emerging topics are detected. The first one is labelled biodiesel and related to the topic about diesel and other petrol-derived fuels from the first period. The topic is characterised by an extensive growth and also by a topic shift towards non-petrol fuels. The second emerging topic is a new type of topic labelled fuel cells. The seed for this topic lies with the topics solar cells and renewable energy from the first period.
When analysing emerging topics within the dynamic field of environmental sciences, it is important to consider that the whole field is characterised by an extensive growth over the last decade. The first of the two detected emerging topics is labelled radiation. It represents a completely new topic with no close links to any of the topics from the first period. The second topic is named nanopollution and emerged from the topic on water from the first period. This topic is not only characterised by a topic shift towards nano materials but also by a considerable growth.
Within the dynamic field of obstetrics and gynaecology, one emerging topic labelled as prenatal diagnosis was detected. The topic is already present in the first period but experienced a clear topic shift over time. In particular, it is influenced by technological advancements in three-dimensional ultrasound imaging technologies, which allow to focus on new diagnostic approaches using, for example, fetal bone images.
Two emerging topics could be identified within the dynamic field of public, environmental and occupational health. The first topic 'Environmental issues', deals with specific health issues related to the environment or location where people live. The second topic is labelled AIDS in Africa. It emerged from human immunodeficiency virus (HIV) research present in several topics in the first period considered. But the focus of this topic is completely different. It is mainly devoted to social, socio-political and regional aspects related to the HIV infections with a regional focus on AIDS in Africa.
Finally, within the dynamic field of geography, one emerging topic labelled 'State and region' was detected. It arose via a combination of the topics on nationality and region from the first period. However, it is not just a merger of the two topics but a real shift in content can be observed. The focus now is on the region as defining identity in a globalised world.
An important element of the ERACEP approach was not only to rely on advanced bibliometric and text-mining methods for the identification of emerging topics, but also to combine this view with a qualitative expert-based view. Accordingly, qualitative assessments of the fields and topics were carried out via interviews with experts from the respective disciplines. All in all, a considerable share of the emerging topics as identified by the different bibliometric analyses could be confirmed by this expert validation. However, also examples were identified where the bibliometric methodology alone seems not to be sufficient for identifying the most emerging topics within certain disciplines. We observe an interesting and notable correlation between validation of fields and the ISI subject categories. Most discrepancies are present in the arts and humanities where experts had most difficulties to confirm the bibliometric results. A main reason for this deviation is that in fields such as architecture and religion unlike in natural sciences a substantial amount of articles is published in non-English journals that could not be included in the bibliometric analyses. In addition, many regions in Europe have traditionally quite different research focuses in the dynamic areas identified via bibliometric analyses. This reflects, for example, geographical and cultural influences on fields such as architecture and certainly archaeology. Taken together, the bibliometric cluster analysis has proven to be a powerful methodology for identifying emerging fields of science. However, it also became clear that there is a need to complement the quantitative statistical analysis by qualitative expert assessment in order to obtain robust and validated results.
ERACEP created a large set of publications reflecting emerging research topics worldwide. The availability of such a dataset allows to carry out additional country-specific analyses which had not been considered as feasible at the beginning of ERACEP. In particular, the following questions related to an international positioning of research activities were tackled:
- Which countries are the most active players in the emerging topic?
- Do papers from European scientists differ from American papers?
- To what extent do countries active in these topics collaborate with each other?
The following examples illustrate how the results of ERACEP can be used to put the emerging topics and trends observed into a geopolitical context.
In general, we observe substantial differences with respect to national contributions and international collaborations amongst the distinct dynamic fields and emerging topics. The United States of America (USA) is contributing to about 60 % of all papers in the emerging topic labelled AIDS in Africa, while its overall share in the field of energy and fuels is just 17.6 % with for the emerging topics roughly 15 % for biodiesel and 22 % for fuel cells. An opposite relationship can be detected for the People's Republic of China. Although Africa is of high economic importance to China, they contribute only to 2.5 % of all papers concerned with AIDS in Africa, while their share in papers on biodiesel (17.3 %) exceeds the share of the USA. Also, evidence shows that the USA takes a clear lead in topics concerned with medical issues, while this leadership is challenged in fields related to environmental issues and also to the environmental dimension in public health.
The collaboration analysis is based on the construction of networks for dynamic fields as well as for specific emerging topics. We observe a dense network in environmental sciences where almost all European countries have strong cooperation links with each other. We also observe a remarkable peripheral position of the USA with respect to the European environmental network. A similar observation can be made in the emerging topic 'Environmental issues within public health'. A completely different cooperation pattern is obtained in the emerging topic 'Brain-computer interface' where the USA plays a central role. The field of energy and fuels represents an interesting cooperation structure in that we observe a loose cooperation network. This might be explained by the fact that countries are well aware of technological and economic potential that can be realised from science developments in these fields. Therefore, the country degree of openness changes according to emerging fields. In the previous example, this could reflect a stronger economic interest in energy and fuels as compared to the medical area where the effort to contribute to global health might dominate.
Mapping ERC-funded research to emerging research topics
For mapping ERC-funded research to the emerging research topics the balance between false-positive and false-negative hits plays a crucial role for the applicability of any method. As it is the aim to investigate to what extent the ERC-supported activities cover and contribute to the detected emerging topics, the avoidance of false-negative hits is more crucial to the procedure. Accordingly, our choice of methodological approach was accommodated to this requirement. As a data sample for the method a set of 932 applications to the 2009 starting grant call of the ERC could be used. For the mapping to emerging topics, the same set of application data was used as for the identification of emerging topics via the hybrid clustering approaches. All in all, a set of 164 220 unique documents was used for the matching procedure.
A full-text-matching approach was chosen. As a first step a database of publications and a database of applications are set up in parallel. The individual datasets are processed, so that for both, the publications on the one hand and the applications on the other hand, text fields are obtained. In a next step, these fields are indexed in both datasets using the LUCENE text index. From both indexes a set of common terms is extracted. Document-by-term matrices are created containing the raw frequency of each term in the document. After applying a weight to term frequencies, the matrices are combined into a paper-by-application similarity matrix. Taking the average similarity over papers within a certain topic results in the final topic-to-application similarity matrix.
For the validation of the results of the mapping a manual qualitative procedure is used. The aim of the validation is firstly to estimate the appropriateness of the developed mapping methodology and make adjustments if necessary, secondly to justify decisions upon cut-off thresholds, and thirdly to support the removal of false-positive applications. The main approach towards validation was analysing the content of the proposals manually and comparing their content to the matched fields.
As a result of the matching, 885 applications could be used and all of them shared at least one term with the documents in the publication set. Accordingly, no document was omitted due to restrictions of terms to the set of common terms. After the calculation of average similarities between proposals and topics and the assignment based on thresholds, 289 applications were mapped with at least one topic, 173 applications had multiple assignments.
One of the objectives of ERACEP was to explore to which extent ERC funding procedures are able to address and contribute to emerging topics. The results of our matching approach which was carried out for a selected number of thematic fields and emerging topics indicate that ERC funding indeed is able to address emerging topics. However, we observe substantial differences across different topics.
In biomedical research we find only a very small number of applications relevant to the emerging topics of brain-computer interface and kinematics and none of them got funded. In the field of energy and fuels with the emerging topics of fuel cells and biodiesel, a substantial number of proposals, several of which were successful, could be linked to these emerging topics. In environmental sciences, we find a small number of proposals being mapped with the emerging topics of radiation and nanopollution, but in radiation four out of five applications got granted, while for nanopollution it was none. In the field of obstetrics and gynaecology, most of the 39 applications could be linked with the topic cancer. All other topics present very little numbers of applications related to them. This includes the emerging topic of prenatal diagnosis. A similar situation is observed in public health research, in which the emerging topics 'living environment' and 'AIDS in Africa' are not addressed well. On the other hand, AIDS in Africa presents a rather high success rate. In the field of geography we find a strong focus of applications on the emerging topic of state-region and also a high success rate of these proposals. These differences between thematic fields in terms of coverage and success rates of proposals indicate that is might be useful to explore in more details the reasons for such differences. This would allow a better insight into the operation of ERC procedures. Bibliometric approaches are certainly not suited for such analyses. Rather, qualitative expert-based approaches would be adequate.
In conclusion, the matching methodology developed within ERACEP has several ad-vantages as the text-based link can be quantified and calculated as soon as an application is submitted. It is using the full body of the text and does not involve passing of the text to extract references that have to be matched with individual papers or journals. Any improvement to the body of the text will of course also be beneficial to the matching exercise.
ERACEP developed methods and tools which allow the detection of emerging research topics and a matching of grant applications to different scientific fields including such emerging topics. Both approaches are conceptually independent from each other and can serve different purposes within the workflow of ERC. The identification of emerging topics is more relevant to the internal organisation of the ERC. The mapping of proposals relates directly to the main objectives of ERC to support and encourage creative scientists to be advantageous and take more risks. In any case, we consider it as crucial for both approaches that they should not be considered and used as a replacement of any expert-based assessment. Rather they are suited to support the activities of human experts.
A first immediate use of the ability to identify emerging research topics is an initiation of internal discussions about the structure and scope of different evaluation panels. Descriptions of existing panels could refer explicitly to new topics. Respective keywords could be updated and existing panels revised or new panels created according to the landscape of emerging topics. Accordingly, the main advantage of using emerging topics would be that such adjustments and modifications would not be driven by external classifications, but rather by the inherent evolution of science. A second immediate use would be the support in identifying appropriate panel members with the right expertise in emerging topics needed for the evaluations.
The matching approach of ERACEP can be used in an ex ante and in an ex post way during the evaluation procedures. In the ex ante mode the mapping of applications to certain topics could facilitate the assignment of applications to evaluation panels after submission. Applications could also be matched with a set of key papers which in turn would allow to better identify the most appropriate external experts which might be included in the evaluation procedures. Finally, a pre-selection of applications would be possible in a sense that applications can be mapped to thematic clusters labelled as emerging, established or even vanishing. Such a pre-selection would facilitate the identification of applications that would require special attention during the evaluation. Ex post results of the mapping exercise can be used for a reflexive assessment of ERC procedures. In particular, similarities and differences between the outcomes of different panels across different fields could be analysed. Topics, where the reasons for selection or non-selection may need additional consideration could be better identified; finally, the combined analysis of mapping, the results of the review process and the outcomes of the successful grants could provide an important feedback for adjusting the long-term strategy and policy of the ERC.
Dr Thomas Reiss
Fraunhofer Institute for Systems and Innovation Research
Breslauer Str. 48, 76139 Karlsruhe, Germany