Skip to main content

Academic Careers Understood through Measurement and Norms

Final Report Summary - ACUMEN (Academic Careers Understood through Measurement and Norms)

Executive Summary:
Academic Careers Understood through Measurement and Norms (ACUMEN) is a European research collaboration aimed at understanding the ways in which researchers are evaluated by their peers and by institutions, and at assessing how the science system can be improved and enhanced. Assessment of the performance of individual researchers is the cornerstone of managing the scientific and scholarly workforce. It shapes the quality and relevance of knowledge production in science, technology and innovation. Currently, there is a discrepancy between the criteria used in performance assessment and the broader social and economic function of scientific and scholarly research. Many job interviews and grant decisions are mainly based on traditional criteria of scientific quality and don’t sufficiently take into account the broader societal roles of scientific and scholarly research. In addition, the increased scale of research has led to a sharp increased workload of reviewers, which may partly undermine the quality of the review process. This may lead to more demand for quantitative performance indicators. However, many science & technology performance indicators are not applicable at the level of the individual researcher.

To address these problems, ACUMEN has developed criteria and guidelines for Good Evaluation Practices (GEP) and has designed a prototype for a Web based ACUMEN performance Portfolio. Each researcher's ACUMEN Portfolio combines multiple qualitative and quantitative evidence sources about her career and contributions. The ACUMEN Portfolio is a way for researchers to highlight their achievements and to present themselves in the most effective way. It supplements the traditional Curriculum Vitae because it highlights key achievements rather than giving an exhaustive list. It surpasses the usual list of publications since it contains a systematic set of types of information related to the three crucial aspects of an academic's career: expertise, outputs and influence. The ACUMEN Portfolio also contains a narrative that the Portfolio owners can use to explain their academic value, backed by evidence from the rest of the portfolio, when possible.

The portfolio is supplemented by Guidelines for Good Evaluation Practices with the ACUMEN Portfolio. This guidelines document is among others aimed at evaluators who are intending to use the ACUMEN Portfolio to aid in decision-making, such as for funding, promotion or appointments. It can also be used by individual academics seeking to create a Portfolio for self-evaluation purposes or to supplement their CV, to understand the portfolio concept or to ensure that their portfolio is as effective as possible. The guidelines also explain the different strengths and weaknesses of the various metrics and indicators.

The structure of the ACUMEN Portfolios for individual academics is based upon comparative research performed by the ACUMEN project on: 1) the peer review system in Europe; 2) new quantitative performance indicators; and 3) new ways of using the web by researchers. This research has been published in a host of scholarly publications. A comprehensive list can be found on the ACUMEN website.

This report includes a summary description of the project context and objectives, a description of the main S&T results/foregrounds, the potential impact and the main dissemination activities and exploitation of results, and an overview of ACUMEN’s public website.
Project Context and Objectives:
Academic Careers Understood through Measurement and Norms (ACUMEN) serves a dual purpose. First of all, ACUMEN seeks to understand the ways in which researchers are evaluated by their peers and institutions. Second, this knowledge has been used to propose new and/or updated evaluation criteria and guidelines that address the broader social and economic functions of scientific and scholarly research, so called Good Evaluation Practices (GEP). The ultimate goal of the project was the design of a portfolio of evidence for individual research careers.

The main shortcomings of the evaluation practices currently in use are the following. First, the evaluation criteria are still dominated by mono-disciplinary measures, which reflect an important but limited number of dimensions of the quality and relevance of scientific and scholarly work. Publication in international peer reviewed journals and bibliometric indicators such as impact factors and numbers of citations have become the dominant ways of measuring an individual researcher's quality in many, albeit not all, fields. Second, the growing size and complexity of the scientific and scholarly system has increased the pressure on the existing forms of quality control and evaluation. As a result many career decisions are informed by a rather shallow, routinized, operationalization of the notions of scientific quality and relevance. Third, the evaluation system has not been able to keep up sufficiently with the transformations in the way researchers create knowledge and communicate their research to colleagues and the public at large. Most importantly, researchers’ activity on the web – for example in the form of novel ways of collaboration or communication - has been insufficiently measured. Fourth, most of the bibliometric and quantitative scientometric indicators currently used to measure research performance do not produce viable results at the level of the individual researcher. Fifth, the scientific and scholarly system has a gender bias which makes it more difficult for women researchers than for men to fully develop their potential and careers, although there are early indications of a promising change in younger generations of researchers.

ACUMEN endeavoured to address these shortcomings in 6 different work packages. These packages addressed the main problems in the evaluation of individual researchers and in doing so satisfied the project goal of creating good evaluation practices and a design for the ACUMEN Portfolio. WP1 analysed experiences and views of current evaluation and peer review practices held by researchers and other stakeholders. It also identified and analysed novel and broader ways of evaluating individual researchers. WP2 focused on researchers’ web presence and developed a series of indicators based on web data. WP3 studied how academic research is discussed on the Web, with an emphasis on social media platforms. WP4 analysed the differential gender effects of existing and new evaluation indicators, qualitative criteria and procedures. In WP5 established and newly proposed indicators from the literature were tested. This WP also considered unexpected side effects and analysed how well the indicators measure productivity, impact, collaboration and diversity. Finally, in WP6, the information gathered in these work packages were integrated in the recommendations for an ACUMEN Portfolio of evidence in support of individual career achievements for researchers throughout science and engineering, social science and the humanities. Each researcher's ACUMEN Portfolio combines multiple qualitative and quantitative evidence sources about their career and contributions. Testing of the principles for this portfolio was also included in this work package.

ACUMEN departs from the dominant definition of evaluation in three different dimensions. First, evaluating research performance is usually conceptualized as the more or less straightforward measurement of the production of institutions, groups and individual researchers working in these settings. These researchers are subjected to the evaluation, and although in some countries they are usually asked to prepare forms of self-evaluation, they are not seen as the central actor in the evaluation process but as its object. Second, evaluators usually assume that the evaluation process itself is neutral with respect to the outcomes. Their vision is also often limited to the specific evaluation at hand, and they have no overview of the cumulative effects of evaluations of individuals on the scientific and scholarly system at a higher level of aggregation. In other words, evaluators have a systemic blind spot with respect to the performative effects of the evaluation criteria and process. Third, evaluation is usually conceptualized in universal, more or less timeless, concepts such as "excellence", "originality", and "social relevance". These concepts basically function as container concepts that can have quite different meanings in different contexts. Framing evaluation in these container concepts often prevents an understanding of the social and cultural variation in actual evaluation practices. These practices are still poorly understood, partly due to the confidential nature of job assessments, journal publication peer review, and grant application evaluations. As Lamont (2009) has remarked: "Peer review is secretive".

ACUMEN has developed a new perspective in these three dimensions. First, evaluation has been analysed from the perspective of the individual researcher's career development. Evaluation is a more complex interaction than simply the measurement of the performance of the researcher. It is a communication process in which both evaluators and the researcher under evaluation define what the proper evaluation criteria and materials should be. The key outcome of evaluation systems is not only the conclusion with respect to the future prospects of the researcher and her manuscripts. At least as important, and sometimes even more important, are the intermediate effects of the process of evaluation on the researcher, on the evaluator, and on the future instances of evaluation. ACUMEN aims to provide state-of-the-art tools to the research community that can be used both by evaluators and individual researchers in the form of an ACUMEN Portfolio of evidence, which is based on a set of ACUMEN criteria for Good Evaluation Practices. The Portfolio will enable a researcher to propose an extended set of materials and, if the procedure allows for this, even criteria for evaluation in relation to the relevant scientific and social mission of her research.

Second, ACUMEN puts the constructive and performative effects of evaluation central in order to assess the implications of new evaluation criteria and guidelines on individual careers and on the scientific system as a whole. Evaluation systems inevitably produce quality and relevance as much as they measure it. This holds both for indicator based evaluation and for qualitative peer review evaluation systems. Evaluation systems have these effects because they shape the career paths of researchers and because they form the quality and relevance criteria that researchers entertain. These feedback processes also produce strategic behaviour on the side of the researchers, which potentially undermines the validity of the evaluation criteria. ACUMEN has therefore analysed how current and new forms of peer review and indicator systems as main elements of the evaluation process will define different quality and relevance criteria in the evaluation of individual researchers, on the short term as well as on the longer term.

Third, ACUMEN has analysed the diversity of current evaluation practices in a comparative research design. The existing evaluation practices and cultures vary by nation, by institution and by discipline. Although virtually all evaluations aim to ascertain excellence at the international level, how this is operationalized varies greatly. In some cases, citation analysis is very influential, in other cases evaluators and researchers tend to frown upon these quantitative indicators or claim that they are not applicable to their discipline or institution. In some countries, traditional peer review systems are still dominant at the national level, whereas in other countries these criteria have been supplanted with a large set of requirements based on the economic and social effects of research. In combination with these differences between countries, institutions and disciplines, there are also common elements that in some cases have become more important. The requirement of publication in international peer reviewed journals has become much more pervasive, including in fields where this was traditionally not part of the publication culture (e.g. in technical and social sciences). In peer review, there seems to be a trend of increasing professionalization of peer review itself with stricter quality standards regarding the role of evaluators. This has moreover created a demand for quality control of quality control procedures that have been put in place a number of European countries, again in quite different forms and institutions. The European evaluation system is, in other words, a patchwork of evaluation cultures that need to be understood much better if we want to be able to understand the wider implications of new evaluation systems and criteria. ACUMEN has studied both variation and common themes across European countries, disciplines and different types of research institutes. ACUMEN did not aim to study this variation in its totality, but has focused on its manifestations at the level of the individual researcher's career.
Project Results:
WP1 – Evaluation Impact

Literature Survey - Main findings

The literature on peer review of grant proposals (which was the focus of this literature review) makes clear that there is not one model that all should follow –peer review is not a singular process, but rather a flexible set of mechanisms with many variations in practices. The prerequisite of good peer review is its permanently improving character, and involvement of high level stakeholders and experts. High level of expertise among the peer reviewers is certainly a must. However, quality evaluations come from diverse panels of experts, which might include a mixture of backgrounds and, if relevant, different straightforward approaches. They will usually have to be tailored to the type of call. Panel composition should take into account appropriate coverage of the relevant scientific and technological domains, including interdisciplinary and socio-economic aspects. It should also be, as far as possible, balanced in terms of gender, age, affiliation and nationality, including representatives from the civil society.

An important point of view is that scientific quality should not be sacrificed in favour of relevance and impact. In this perspective, applied research ought to meet the same standards of research design, sample selection and evidential inference that applies to any sort of work (allowing for the practical difficulties of conducting applied research). Indeed, if research is being used by policy makers to take decisions on matters that have a direct effect on the quality of citizens’ lives, the standards ought to be as high as possible.

A major problem in peer-review judgments is the substantial error that occurs when a large number of reviewers evaluate only a few articles or grant applications. To overcome this problem, it is suggested to try a ‘reader system’ in which a select few senior academics read all the articles or grant applications in their particular field of scholarship.

An important problem mentioned in the literature is the availability of reviewers for grant peer review. This can be approached in both public service terms and in terms of a market for peer reviewers. According to the latter, the ‘market’ for peer reviewers needs to be analysed, including the possible identification of non-financial incentives. Surveys and other evidence have shown that there are various reasons why academics participate in peer review. Not all motivations are altruistic, and there is no reason why they should be. However, a central element, without which the peer review system would not exist, is the professional commitment to contribute to the academic public good. Each university in receipt of public funds should accept an obligation to encourage its researchers to engage in these activities, recognising that peer review is an essential part of the fabric of academic life – the costs of which are met by the funds allocated by the Funding Councils to support research infrastructure. A more sophisticated understanding of the costs of peer review needs to be developed. There is also a growing need to train reviewers - traditionally reviewers have been ‘part-and-parcel’ of the system applying their expertise in the reviewing process without experiencing any training.

The idea of drawing up a common database of "certified" experts, which was very popular at the beginning of 21th century started to be treated more carefully later. In fact what might appear initially simple and attractive to implement, raises a number of problems (how and by whom the certification is made; how discipline boundaries are defined; how possible reputational consequences for experts who are deemed unsuitable for the database should be dealt with). Different kinds of peers should be used for different purposes – specifically targeting specialists in translational or high-risk, innovative research, for example, where this is the desired outcome. This has important implications for funding bodies; since reviewers both identify and define good research, an extensive understanding of different views within a field will be required by the staff who select reviewers.


Improving and Modification of Peer Review

Several novel approaches for research evaluation have been proposed as alternative to traditional forms of peer review, both with respect to grant peer review and journal and personnel review. This ranges from bidding mechanisms, to Web based peer evaluation communities. The landscape of the processes for the evaluation of research outputs and of researchers is changing. In the near future we envision the growth of various tools for research evaluation, including open source and those operating with open API/protocols. Such tools would primarily operate on the Web and include the variety of methods for research evaluation, so that program committee chairs or journal editors (or even people playing some new emerging roles which do not exist yet) will be able to choose. Examples of tools with such functionalities have already emerged (e.g. Mendeley, Peerevaluation.org Interdisciplines.org) but it is not yet clear how these tools can be connected and which of them will be adopted widely enough to have a normative effect. Attention should be paid less to designing “the” scientific evaluation system of tomorrow – something that, like “the” peer review process, will be an emergent phenomenon based on the different needs of different disciplines and communities. Instead, attention should focus on ensuring interoperability and diversity among the many possible tools that scientific evaluation can make use of.

Potential modifications to the grant peer review process may be considered to improve efficiency or effectiveness. With respect to efficiency, for example, improvements could be brought about by moderating demand to ensure that the number of applications received is kept below a certain threshold – thus reducing the burden on reviewers and applicants. This could be achieved by (i) reducing advertising; (ii) changing deadline systems for funders that use fixed milestones for submission; or (iii) limiting the number of applications from particular institutions. It may also be possible to streamline assessment procedures using tighter systems of triage on applications received. Other potential cost-saving measures include (1) reducing the number of external referees involved in peer review of grant applications, and (2) increasing the use of technology – including videoconferencing – so that peer review panellists do not have to gather in one place for scoring meetings.

An important aspect is also improving the capacity of peer review to support applied research: the selections of the Panel members from both academic peer review and decision making constituencies. Educators and communication experts may also participate if the proposal in question is likely to be a high-impact area of research. The aim is thus to evaluate research proposals both in terms of their scientific merit and the potential impact they may have. Another option mentioned is improving the capacity of peer review to support innovative research, according to a “DARPA model”: a narrowed down version of peer review, in which there is no panel, simply ‘expert’ judgement by a specially selected programme manager.

Survey on Peer Review Practices - Main findings

Reasons why the respondents agreed to be reviewers were mainly linked to research ethics – obligation towards the field, intention to ensure the quality of the field, and desire to help fellow researchers. Also self-improvement was important – to receive an overview of own field. While the majority of postdoctoral research fellows (67.3%) and lecturers and assistant professors (60.6%) had never refused to be a reviewer, the majority of associate professors (57.6%) and full professors (83.6%) had refused reviewing. The most common reasons to refuse to review were lack of time – 84% of all respondents stated that this was often or even very often a problem, the second by importance was the feeling that they lacked the relevant expertise (60.8%). Here postdoctoral research fellows were the most confident, only 45.2% of them considered it a frequent reason (for 60.9% of full professors it was a problem).

It seems that in peer review practice it is not common to inform reviewers about the final results of applications reviewed by them. About 34% of reviewers had never been informed about results, and 23% of them had been informed very rarely. The majority (75.3%) of respondents did not consider it necessary to receive feedback about the final results. More than a half (56%) of respondents had had experience with a system that allows applicants to nominate possible reviewers, and the majority of them (88.5%) had used this possibility. A little less (48.3%) known is a system that allows applicants to exclude reviewers but the same time majority (56.9%) of those who know the system have used it. Full professors have been particularly active here – 63.2% have used this possibility. The majority of respondents (59.9%) favour the opinion that there is a need for improvements. This seems to be particularly relevant in medical sciences where 72.4% of respondents voted for changes.

The majority of the proposals to improve the Peer Review system were related to reviewers. The overwhelming view was that the people who agree to participate in the PR process should be recognized. A good reviewer has relevant disciplinary competence and academic excellence, the comments are comprehensive and useful, the review is written in appropriate language and it is submitted in time, previous peer review experience is also needed. The majority of respondents wanted to have a reviewer's written evaluation available to the applicant, excluding reviewer's name (54.6%), and considered that an applicant should have the possibility to read and respond to the reviewer's comment before the final decision (49.9%).

Main lessons for evaluation practices

Mismatches between perspectives of employers and employed
A partial mismatch exists between the important developments and influential evaluations reported by individual academics and those reported by human resource managers as important evaluations. Where their answers agree in terms of identification of these developments and evaluations, they disagree about reasons why they are important. This may mean (but this is not documented in the interviews) that evaluation criteria and procedures, which are set up by employers, may not take the perspective of the evaluated into account. Arguably, they should do this, because what looks like an indication or evidence of certain quality to the evaluators, may mean something completely different to the evaluated. Thus, this indicator or evidence may be a bad predictor of future behaviour. For example, an applicant may make sure to have an impressive publication list because he knows it is an evaluation criterion, not because he has an intrinsic motivation to publish. Once he achieves a permanent position, he may be hard to motivate to continue publishing at the same level. Vice versa may also hold. The evaluated may misinterpret the criteria as an indication of expected future behaviour. This would be the case if an impressive publication list is required in the advertisement although the job may mainly involve teaching.

Cohorts and country specific backgrounds need to be taken into account: an argument for a life-cycle perspective

Individuals develop their careers in different periods. They orient on criteria that rule their work and career progress at one point, but by the time they live up to these criteria, the criteria may have changed. They may end up in a Catch-22 situation. Similarly, individuals who move between countries may face problems because their move exposes them to different evaluation and career regimes than the ones they grew accustomed to. One cannot change one's life's work with every change in criteria. Evaluations should take this into account. One way to do this is simply not to change important criteria too drastically in re-occurring types of evaluations. Another way is to formulate alternative criteria for different cohorts or individuals who have moved countries as part of their training or career or life trajectory.

Take invisible work into account
Interviewees identified a range of activities that are important for their work but not evaluated in any evaluations. This list of invisible work is used to check the elements of the design of the ACUMEN Portfolio. ACUMEN points to the role of on-line activities that have grown in the past two decades since the arrival of the World Wide Web. It should be noted that our interviewees did not mention on-line activities such as blogging, tweeting, on-line discussions and YouTube as important for their work or career but not evaluated. The list of unevaluated work does include activities that may involve on-line work, such as teaching and activities related to societal relevance. Although researchers did not mention on-line activities, these activities in the future may become more important.

Another indication for future demand, is that a Dutch HR manager who responded to a different question, explicitly mentioned media and when asked which media, started her answer with social media and continued with newspapers and television. She saw these mostly in terms of visibility and more or less equated that to societal relevance. Her institution keeps track of media-appearances, so, at present university managers may be more interested than researchers in the use of altmetrics.

Informal evaluations
A considerable number of evaluations that played a role in important developments in interviewees' career and a large number of influential evaluations concern informal job applications, and invitations to for example participate in a project, undertake a PhD project after the MA, undertake a postdoc after the PhD, apply for a job, or write a chapter in a book.

Such evaluations can be characterized as 'up close', 'in full' and 'in view'. The individual or materials under evaluation (the application or other work that the evaluated has written or done) are closely examined, the evaluator may know the evaluated from earlier collaborations or interactions, and may take more materials into account than the actual application if there is one. A striking aspect of many informal evaluations is that the evaluated, and in some cases also the evaluators, are not anonymous. Put differently, these evaluations have little to do with the model of double-blind peer-review, which opens the evaluation up for use of a portfolio because portfolios are anything but anonymous.

Obviously, in some situations, a portfolio presentation is not likely to have a place in the evaluation. For example when a supervisor invites his/her PhD student to do a postdoc, the supervisor will already have enough experience with the student that a portfolio presentation may not add much. Still, in other situations a portfolio presentation could play a role. Possible examples include the following: the aforementioned supervisor may need the Portfolio on record for formal reasons; when a project team is looking for someone for a particular task and team members need to present potential candidates to each other; a book editor meets someone at a conference and wants to consider him/her as an author for a chapter; two researchers meet at a workshop and want to know a little more about one another. At present, one would invite a LinkedIn connection or check a staff page. Another striking aspect of informal evaluations is that they are, indeed, informal. Although many interviewees talked positively about the informal evaluations they reported, we should keep in mind that the outcomes of these evaluations were positive and the respondents may there have been biased in their judgment.



WP2 – Institutional Web Presence


Task 2.1 Web presence:
Assessing web presence of European academics. We (a) took a large random sample of European academics and researchers across four disciplines for an email survey and (b) conducted a follow-up outlink analysis of European academic Web CVs from four fields. Of 2,154 responses about 61% had a Web CV (including homepages and publication lists), although this was higher in philosophy with 78% and lower in public health with 47% and higher for males (65%) than females (49%). Outlinks analysis showed that about 34% of online CVs had at least one outlink URL to open access sources (e.g. OA archives or PDF files) as an evidence to share full-text of research and this was higher in astronomy (48%) and philosophy (37%) than environmental engineering (29%) and public health (21%). The overall geographical and gender differences for outlinking to OA research were notable across Western (high) and Eastern counterparts (low) (Kousha and Thelwall, 2013 and in press 2014).
Conclusion: Not all researchers have a web presence and researchers having it can use different ways. Moreover, not all researchers with a web presence use it to point others to relevant resources (e.g. preprints/postprints, slides or data). An appropriate web presence is desirable for linking the ACUMEN Portfolio to the relevant web contents, because reviewers and funders may require supporting information for assessments. Although web presence is not an indicator of success, it could be an indicator of effort put into publicising research and other academic outputs which can be a rich data source for the ACUMEN Portfolio.
Publications covering Task 2.1 (see also Task 2.4 for highly cited EU researchers):
Kousha, K. & Thelwall (2013). Evaluating the Web Research Dissemination of EU Academics: A Multi-Discipline Outlink Analysis of Online CVs. 14th International Conference Society of Scientometrics and Informetrics Conference 2013, 15-19 July, 2013 Vienna, Austria.
Kousha, K. & Thelwall, M. (in press, 2014). Disseminating Research with Web CV Hyperlinks. Journal of the Association for Information Science and Technology.

Task 2.2 Webliography:
“This is a bibliographic/webliographic review. The state-of-the-art regarding the measurement of individual contributions should focus on how scientists have been using the web to make their work available and in which ways it could be improved.”

Summary of review book chapter covering Task 2.2:
Web impact metrics for research assessment: In a book chapter we have discussed the use of web metrics for assessing the impact of academic research—whether artefacts, articles, researchers or institutions (Kousha & Thelwall, 2014). We argued that web impact metrics can potentially supplement conventional impact metrics by including new or unique types of sources of impact (e.g. presentations, syllabi or digitised books), emerging types of scientific outputs (e.g. online videos or science blogs). Different methods were described to collect web impact metrics, including hyperlinks, web citation, URL citations and hybrid approach (Web/URL citation) commonly by a web crawler or by queries to commercial search engines. Other attempts to extract and use formal citations from digital libraries including CiteSeer, Google Scholar, Google Books were discussed. In particular, Google Scholar citation metrics (citations counts, h-indexes, etc.) and Google Books citations from a huge number of digitised books can be used for monitoring research performance, when traditional citation indexes are not available or have insufficient coverage (e.g. in the humanities). New types of web impact including citations from online course syllabi which potentially reflect the educational impact of research, download counts of academic publications which may be an indicator of reading and usage were introduced. The chapter briefly discusses emerging social web impact metrics or altmetrics which can potentially be used outside standard academic sources and indicators such as social bookmarks, tweets or online readership of scientific publications or views of online academic videos (see also literature reviews in publications under Tasks 2.4 and 2.5).

Conclusion:
There are many ways in which research impact can be assessed using the web and the practical applications of web extracted metrics for research assessment include calculating indicators for objects outside of traditional citation indexes, from scientific publications to scholars and institutions. However, web impact indicators suffer from a generic lack of quality control compared with scholarly citations, and hence should be used cautiously in research evaluation. The book chapter mentioned above documents pros and cons of web metrics for individual assessments of academics in the ACUMEN Portfolio and justifies proposing them in Task 2.3 (see also Table 2.3).

Publication covering Task 2.2:
Kousha, K. & Thelwall, M. (2014). Web Impact Metrics for Research Assessment. In: B. Cronin & C.R. Sugimoto, (Eds), Beyond Bibliometrics: Harnessing Multidimensional Indicators of Scholarly Impact, MIT Press. ISBN: 978-0262026796.

Task 2.3 Web indicators:
“Proposal of web-based indicators to be used for evaluation, primarily in the ACUMEN Portfolio. Combining political issues, evaluators’ needs, current publication issues, impact of the open access policies, feasibility topics and mathematical tools, a model for encompassing a new series of web indicators should be developed.”

The web indicators for Task 2.3 are listed in the ACUMEN Portfolio. The new indicators are listed below. The complete list of indicators used in the Portfolio is provided for by the Good Evaluation Practices document.
• Research impact: this includes citations from online scholarly publications (e.g. presentation files or blog posts) from search engines.
• Teaching impact: this includes mentions of research in online course syllabi.
• General web impact: this includes web mentions of titles or URLs (for open access) of research.
• Usage impact: this includes number of downloads or views from different academic web sources (e.g. academic databases and digital libraries).
• Social usage impact: this ranges from number of downloads, views or readership of document or videos to views of academic profiles or Tweet counts or posting research in Facebook. Free social networking tools such as Mendeley.com Academia.edu ResearchGate.net and Twitter can be used for assessing social usage impact of research.

Task 2.4 Highly cited scientists:
“A data set of scientists extracted from ISI Highly Cited database will be built with personal web information, and from the presence of their contributions in selected repositories and academic search engines and other relevant bibliometric and webometric figures.”

European highly cited researchers. The online presence of about 1,500 highly cited researchers working at European institutions showed that about 70% of them have a personal website or other web contents, specially the scientists from Denmark, Israel and the United Kingdom. Nevertheless, the results were biased towards senior male researchers working in large countries (e.g. The United Kingdom and Germany). The most frequent disciplines with high web presence were economics, mathematics, computer sciences and space sciences, suggesting the success of open access subject repositories like RepEc, Arxiv or CiteSeerX (Mas-Bleda & Aguillo, 2013). A further study of 1,525 highly cited scientists working at European institutions found that 61% of them had a personal website, although this was higher in social sciences with 79% and lower in health sciences with 49%. Webometric analysis was carried out for the 892 scientists with either a personal website or an online list of publications for which the crawler used worked, showing that 355 (40%) of them created at least one outlink to open access sources [OA repositories, pdf, doc, docx, rtf, ps, gz], being higher in hard sciences (59%) and social sciences (58%) and engineering (45%) than life sciences (26%) and health sciences (19%). Disciplinary and geographical differences for outlinking to OA research also were found for sampled highly cited scientists (Mas-Bleda, Thelwall, Kousha & Aguillo, 2014). It was also investigated if these highly cited researchers had social web presences (Mas-Bleda, Thelwall, Kousha, & Aguillo, 2013) and if these presences had a measurable impact (Mas-Bleda, Thelwall, Kousha, & Aguillo, under review). We found a very low use of social sites, although researchers having one type of profile were more likely to have another type of profile as well. Most social web profiles had some evidence of uptake, if not impact, but the value of the indicators used is unclear.

Publications covering Task 2.4:
Mas-Bleda, A., & Aguillo, I. F. (2013). Can a personal website be useful as an information source to assess individual scientists? The case of European highly cited researchers. Scientometrics, 96(1), 51-67.
Mas-Bleda, A., Thelwall, M., Kousha, K., & Aguillo, I. F. (2014). Successful researchers publicizing research online: An outlink analysis of European highly cited scientists' personal websites, Journal of Documentation, 70(1), 148-172.
Mas-Bleda, A., Thelwall, M., Kousha, K., & Aguillo, I. F. (2013). European highly cited scientists’ presence in the social web. In: 14th International Society of Scientometrics and Informetrics Conference. Vienna, Austria, pp.1966-1969.
Mas-Bleda, A., Thelwall, M., Kousha, K., & Aguillo, I. F. (under review). Do highly cited researchers successfully use the social web?

Task 2.5. Measurement and analysis of the indicators empirically obtained from the cited population.
For this part we evaluated the indicators for specific subjects rather than for the ACUMEN population, as originally proposed. The reason for this was that some of the indicators had already been evaluated but this was not the case for the new indicators. We decided that it made more sense to evaluate complete sets of research for specific disciplines than for the publications of our sample of researchers, which could only be a partial representation of their fields.

Summary of conducted research covering Task 2.5:
Google Books Citations for impact assessment: Google Books can be used to identify citations from digitised books to academic publications (e.g. articles and books) and seems to give better coverage of books than any other source and so would be useful in book-based fields (e.g. arts and humanities and some social sciences). To investigate this for the ACUMEN Portfolio, we introduced and tested a method to automatically extract citation from digitised books and to remove irrelevant matches. The overall accuracy and coverage for the automatic Google citation searches was high (over 90%) and Google Books citations found substantially more citing books than did the Thomson Reuters Book Citation Index (BKCI), with BKCI's results coming predominantly from journal articles. Moderate correlations between the Google Books citations and BKCI citation counts in social sciences and humanities, suggests that they could measure different aspects of impact (Kousha & Thelwall, 2014).

Conclusion:
Within the arts and humanities and some social sciences, Google Books citations has a clear advantage over other traditional citation indexes which impact metrics commonly coming from journal articles rather than books and monographs. Hence, Google Books citation could be a valuable source to evaluate academic researchers in book-based fields.

Academia.edu as informal source of impact: Academia.edu contains social network capabilities in addition to information about publications. We investigated whether Academia.edu popularity statistics associate with academic impact, and hence could be useful for impact estimation. The investigation focused on members of philosophy departments. We found that in comparison to students, faculty tend to attract more profile views but female philosophers did not attract more profile views than did males, suggesting that academic capital drives philosophy uses of the site more than friendship and networking. Secondary analyses of law, history and computer science confirmed the faculty advantage (in terms of higher profile views) except for females in law and females in computer science. It also found a female advantage for both faculty and students in law and computer science as well as for history students. Hence, Academia.edu overall seems to reflect a hybrid of scholarly norms (the faculty advantage) and general social networking norms. Finally, traditional bibliometric measures did not correlate with any Academia.edu metrics for philosophers, perhaps because more senior academics use the site less extensively or because of the range informal scholarly activities that cannot be measured by bibliometric methods (Thelwall, & Kousha, 2014a).
ResearchGate metrics for research evaluation: ResearchGate is a social network site for academics to create their own profiles, list their publications and interact with each other. We examined whether ResearchGate usage and publication data broadly reflect existing academic hierarchies and whether individual countries are set to benefit or lose out from the site. The results show that rankings based on ResearchGate statistics correlate moderately well with other rankings of academic institutions (e.g. The Times Higher Education Ranking or the CWTS Leiden Ranking), suggesting that ResearchGate use broadly reflects traditional academic capital. For ACUMEN Portfolio, ResearchGate view counts and download counts for individual articles may also prove to be useful indicators of article impact (Thelwall, & Kousha, 2014b).

Online videos as new source of academic outputs: Online videos are increasingly used by academics for informal scholarly communication and teaching. We examined the extent to which YouTube videos are cited in academic publications and whether there are significant broad disciplinary differences in this practice. A total of 1,808 Scopus publications cited at least one YouTube video and there was a steady upward growth in citing online videos within scholarly publications from 2006 to 2011, with YouTube citations being most common within arts and humanities (0.3%) and the social sciences (0.2%). A content analysis of 551 YouTube videos cited by research articles indicated that in science (78%) and in medicine and health sciences (77%) over three quarters of the cited videos had either direct scientific (e.g. laboratory experiments) or scientific-related contents (e.g. academic lectures or education), whereas in the arts and humanities about 80% of the YouTube videos had art, culture or history themes and in the social sciences about 63% of the videos were related to news, politics, advertisements and documentaries (Kousha, Thelwall & Abdoli, 2012; Kousha & Thelwall, 2012). For the ACUMEN Portfolio output indicators, we included academic online video as potential innovative method which academics may produce and disseminate research or other academic activities (course lectures).

Publications covering Task 2.5:
Kousha, K., Thelwall, M. & Abdoli, M. (2012). The role of online videos in research communication: A content analysis of YouTube videos cited in academic publications, Journal of the American Society for Information Science and Technology, 63(9), 1710–1727.
Kousha, K. & Thelwall. M. (2012). Motivations for Citing YouTube Videos in the Academic Publications: A Contextual Analysis. 17th International Conference on Science and Technology Indicators (STI), 5-8 September, 2012 in Montreal, Quebec, Canada.
Kousha, K. & Thelwall, M. (in press, 2014). An automatic method for extracting citations from Google Books. Journal of the Association for Information Science and Technology.
Thelwall, M. & Kousha, K. (in press, 2014a). Academia.edu: Social network or academic network? Journal of the Association for Information Science and Technology.
Thelwall, M. & Kousha, K. (in press, 2014b). ResearchGate: Disseminating, communicating and measuring scholarship? Journal of the Association for Information Science and Technology.



WP3 – Researchers’ Web Presence

Task 3.1 Identify and delineate the state-of-the-art of Web 2.0 use to discuss the work of individual academics
The literature review has been published as a chapter of a book by MIT press. (Bar-Ilan, Shema & Thelwall, 2014). It focuses on scientists’ attitudes towards Web 2.0 in the research setting, and covers in detail science blogs and open reference managers. On these sites scholarship is referenced according to the rules of scientific citations, and therefore mentions and bookmarks in science blogs and reference managers can be counted and compared to traditional citations.

Established reference managers (e.g. EndNote) aim to help authors with the referencing process as they write, as well as help in formatting the citations according to the appropriate citation style. Some reference managers, like CiteULike and Mendeley, have additional features, such as reporting the number of users of the system who bookmarked a specific item. Users who bookmark an item on Mendeley are called ‘readers’. Unfortunately, some reference managers (e.g. Connotea) and blogs have proved to be less than sustainable and closed down. Of the existing reference managers, Mendeley seems to be largest in terms of coverage, and a number of studies have calculated correlations between citation counts and readership counts. Significant, medium strength correlations were found between readership counts and citations in a number of studies (e.g. Bar-Ilan et al., 2012, Bar-Ilan, 2012).
Science blogs have become popular with a section of the scholarly community. Respected scholarly media outlets such as National Geographic, the Nature Group, Scientific American, and the PLoS journals all have science blogging networks. Blogs can be used, among other purposes, as a post-publication peer-review system and for investigations of science misconduct (e.g. the blog Retraction Watch). However, blogs are not as sustainable as traditional scientific communication and can be closed or moved on short notice.
Conclusion:

The chapter provided evidence of the value of science blogs and reference managers for scholarly communication and for use as alternative metrics. Although the Web constantly changes, we believe the alternative metric tools reviewed are valid and will continue to be so in the years to come.

Task 3.2 Analyse disciplinary differences in the use of Web 2.0 and web media sites to conduct and disseminate research and to enhance the visibility of the researcher

Writing a scientific blog is a way of enhancing a researcher’s visibility and introducing her or him to a wider audience. Shema, Bar-Ilan and Thelwall (2012a) studied scholarly blogs aggregated in RB, a blog aggregator for blog posts referring to peer-reviewed works. They characterized blogs and bloggers in order to shed light on this form of scholarly discourse. The blogs were classified in order to map out the most popular blogging fields. Life Science blogs were the most popular (39% of the sample), followed by Psychology, Psychiatry, Neurosciences & Behavioural Science blogs (21%); Medical blogs (19%) and Science blogs (9%). Blogs about Social Sciences & Humanities (5%) and about Computer Science & Engineering were the least represented (1%). This of course could be due to the characteristics of RB. In the questionnaire that resulted in the common data set, the respondents were asked to provide information on websites, other than their homepage, (e.g. a blog) where they disseminate their work and/or discuss science. Of the four fields addressed in the project, 27% of the philosophers reported that they post in a blog, compared with 10 % of the astrophysicists and environmental engineers and only 7% of the researchers in public health.

The citing of one’s own work is common practice in formal scientific communication and has been known to enhance the researcher’s visibility. Shema, Bar-Ilan and Thelwall (2012b) investigated the levels of self-citation (citations in blogs to the blogger’s own peer-reviewed research) in blogs using four RB categories: “Ecology/Conservation”, “Computer Science/Engineering”, “Mathematics and Philosophy”. Only bloggers writing under their real names and posts which were signed by them were included. The rate of self-citing posts was low overall but varied according to discipline, with “Mathematics” having the highest percentage of self-citing posts (10%), “Computer Science” and “Philosophy” having a slightly lower percentage (9%), and “Ecology” having the lowest (5%).

Conclusion:
There are disciplinary differences in the use of social media.
Task 3.3 Create guidelines for researchers for effectively using Web and Web 2.0 technologies and stimulating discussion and appreciation of their research amongst different communities

Detailed guidelines for researchers appear in deliverable D3.6. In this deliverable, guidelines for retrieving citation data from Google Scholar, Web of Science and Scopus are also included. We recommend researchers to maintain visibility on the Web, by publishing online CVs, with full list of publications, having a Google Citations profile, an ORCID ID, and maintaining profiles of research-oriented social media sites like Academia.edu ResearchGate and Mendeley. Blog posts and tweets are useful platforms for disseminating and discussing their research. We also recommend self-archiving of their publications and making available their presentations on sites like SlideShare, Figshare or YouTube. These platforms are not only useful for disseminating information, but also serve as platforms for discussion and interaction with other researchers and the public. Of course we also recommend maintaining an ACUMEN Portfolio.

Task 3.4 Identify methodology and preliminary indicators for the online impact of individual academics based upon their Web activities and discussion of their research

Haustein et al. (2014) followed a two-sided approach to explore the representativeness and validity of social media platforms that are potential data sources for altmetric indicators. First, they collected Web information about bibliometricians who have presented at the 2010 Science & Technology Indicators (STI) conference in Leiden. It was shown that bibliometrics literature is well represented on Mendeley. The coverage of the sampled documents was as high as 82% overall with an even higher coverage of recent documents. Mendeley not only had a better coverage of documents, but also higher numbers of readers per document in comparison with CiteULike (means of 9.5 and 2.4 respectively). The correlation between Scopus citations and users counts were .45 for Mendeley and .23 for CiteULike. It should be noted that Bar-Ilan (2014) found a much lower, but still significant correlation between readership and citation counts for a sample of 100 astrophysicists (0.23).

The second part of the study of the Haustein et al. study was a survey distributed among the core of the bibliometric community present at the 2012 STI conference in Montréal asked about the participants’ social media habits and their influence on their work environment. Over half of those surveyed asserted that social media tools were affecting their professional lives, or that they were expecting future influence. Two-thirds of survey participants had accounts in the professional social network LinkedIn, while social networks with a scholarly focus Academia.edu Mendeley, and ResearchGate were each used by only a fifth of respondents. Nearly half of those responding had Twitter accounts, a high number in comparison to other studies about Twitter usage among scholars. The majority (71.8%) believed that the number of article downloads or views could be of use in author or article evaluation. Other sources such as citations in blogs (38.0%), Wikipedia links or mentions (33.8%), bookmarks on reference managers (33.8%), and discussions on Web 2.0 platforms (31.0%) were believed to have potential as altmetrics indicators as well.

ResearchBlogging.org has been shown to be a promising source of preliminary indicators. Shema, Bar-Ilan and Thelwall, (2014) collected data again from the aggregator ResearchBlogging.org during 2009 and 2010, to test a hypothesis whether blog coverage of articles soon after their publication can be a preliminary indicator for a future citation advantage. The sample was limited to journals that had at least 20 articles covered in RB aggregated blog posts during 2009 or 2010 (some journals were included in the sample for 2009 and 2010, but some made it only to the 2009 or 2010 sample). They collected citation data about the blog-covered articles and about the journals in which they were published from Web of Science (WoS) from 2009, 2010 and 2010 for the 2009 articles and from 2010, 2011 and 2012 for the year 2010. They found that articles that were reviewed in blogs soon after their publications had significantly higher median citation counts in the three year citation window than the median citation counts for articles not mentioned in blogs in the year of their publication for 7 out of 12 journals (58%) in 2009 and 13 out of 19 journals (68%) in 2010.

Correlations between readership counts and citations, and early citations in blogs with traditional citations are interesting, but we have to understand why and how scientists use reference managers, and what are the motivations behind science blogging. An initial step in this direction was the questionnaire administered by Haustein et al. (2014).
A further step in his direction is the work of Shema, Bar-Ilan and Thelwall (in press). They studied the apparent motivations behind RB blog posts using content analysis methods. The sample consisted of blog posts from the years 2010-2012. Shema et al. sampled 10% of RB’s health category at random, overall analysing 391 posts. They created a classification scheme with ten major categories (discussion, criticism, advice, trigger, extensions, self, controversy, data, ethics and other) where each had several subcategories. The vast majority of the blog posts (about 90%) included a general discussion of the issue covered in the article, almost 30% of the posts included some criticism of the issues being discussed (not necessarily of the article cited). Over a quarter of the posts offered advice of some sort, showing that the bloggers were willing to share their knowledge and expertise. The relatively high percentage of criticism (compared with studies of traditional citations) suggests that perhaps the informal style of blogs allows for the easier expression of negativity.

Conclusion:
Given Haustein et al. (2014) findings regarding Mendeley’s high coverage of documents in a variety of disciplines and the correlation of its readership counts with citations from peer-reviewed sources, we see it as a promising tool for measuring scholarly impact. Shema et al. (2014) found that as a group, articles covered by scholarly blogs tend to become better cited in later years. Based on these results, we suggest that scholarly blogs might be a promising source of alternative metrics for scholarly impact. In addition, blogs can be a mean of engagement with non-academic audiences. In the ACUMEN Portfolio the researchers can list Mendeley readership counts, their own blogs as well as coverage of their research in blogs maintained by others. The main advantage of these indicators is their timeliness: citations take years to accumulate, while citations in blogs and Mendeley readerships start accumulate within days to weeks after publication, providing indications about the future impact of documents. Other altmetrics, like Twitter counts, and F1000 recommendations are also studied in the research community, but were not specifically covered by ACUMEN.

Publications:
Bar-Ilan, J. (2014). Astrophysics publications on arXiv, Scopus and Mendeley: a case study. Scientometrics, Advanced online publication. doi:10.1007/s11192-013-1215-1
Bar-Ilan, J. (2012). JASIST 2001-2010. Bulletin of the American Society for Information Science and Technology, 38(6), 24-28.
Bar-Ilan, J., Haustein, S., Peters, I., Priem, J., Shema, H., Terliesner, J. (2012). Beyond citations: Scholars’ visibility on the social Web. In Proceedings of the 17th International Conference on Science and Technology Indicators. Montreal, Canada, pp. 98-109.
Bar-Ilan, J., Shema, H. & Thelwall, M. (2014). Bibliographic references in Web 2.0. In: B. Cronin & C.R. Sugimoto, (Eds), Beyond Bibliometrics: Harnessing Multidimensional Indicators of Scholarly Impact, MIT Press. ISBN: 978-0262026796.
Rodrigo Costas, Zohreh Zahedi, Paul Wouters (2014), “Do ‘altmetrics’ correlate with citations? Extensive comparison of altmetric indicators with citations from a multidisciplinary perspective”, (Submitted January 2014) to Journal of the Association for Information Science and Technology
Haustein, Peters, Bar-Ilan, Priem, Shema & Terliesner (2014). Coverage and adoption of altmetrics sources in the bibliometric community. Scientometrics. Advanced online publication. doi: 10.1007/s11192-013-1221-3
Shema, H., Bar-Ilan, J., & Thelwall, M. (2012a). Research blogs and the discussion of scholarly information. PLoS ONE, 7(5), e35869.
Shema, H., Bar-Ilan, J., & Thelwall, M. (2012b). Self-citation of bloggers in the science blogosphere. In A. Tokar, M. Beurskens, S. Keuneke, M. Mahrt, I. Peters, C. Puschmann, et al. (Eds.), Proceedings of the 1st International Conference on Science and the Internet (CoSci12) (pp. 183-192). Düsseldorf, Germany: Düsseldorf University Press.
Shema, H., Bar-Ilan, J. and Thelwall, M. (2014), Do blog citations correlate with a higher number of future citations? Research blogs as a potential source for alternative metrics. Journal of the Association for Information Science and Technology. Advanced online publication doi: 10.1002/asi.23037
Shema, Bar-Ilan & Thelwall (in press). How is research blogged? A content analysis approach. Journal of the Association for Information Science and Technology.
Zohreh Zahedi, Rodrigo Costas and Paul Wouters (2014), “How well developed are Altmetrics? Cross-disciplinary analysis of the presence of ‘alternative metrics’ in scientific publications”, Scientometrics, 18 March 2014 (Online), http://link.springer.com/article/10.1007%2Fs11192-014-1264-0


WP4 – Gender Effects of Evaluation

This workpackage (WP4) has analysed the differential effects of existing and new evaluation criteria and indicators on female and male researchers. WP4 has combined different methods: bibliometrics, quantitative and qualitative methods. In order to be able to promote gender mainstreaming in scientific production, the results are used in the ACUMEN portfolio to increase the awareness and help design guidelines and criteria for improvement of gender mainstreaming and to increase the scientific authority and production of women.

4.1 Comprehensive literature study on gender bias in research careers
As the first task of WP 4 we conducted a comprehensive literature study by reviewing qualitative as well as quantitative studies about gender imbalance, gap and bias in science. According to the literature, there is an increasing gender equity early in the pipeline, at the levels of Master and PhD degrees. However, women are still significantly underrepresented in tenure-track and research university faculty positions. The study of West & Curtis (2006) show that women represents one-quarter of full professors and earn on average 80% of the salary of men in comparable positions. More general gender disparity can be ascribed to a “male model of science”, including masculinity of organisational, social and cultural norms within academic organization (Van Arensbergen, 2014). We showed five explanation models that are frequently used in the gender studies in academia: (1) the glass ceiling: obstacles which are difficult to identify and hold women back from accessing the highest position in the hierarchy, (2) the leaky pipeline: the pipeline has not advanced women to top-level positions due to leaks and blockages in the pipe; (3) the Matthew & Matilde effect: ‘the rich get richer’ (Matthew effect: Merton 1968) and ‘the poor get poorer’ (Matilde effect: Mahbuba & Rousseau 2011); (4) gender myths: persisting myths in favour of men are creating attitudes in relation to the assessment of women’s scientific performance’ and (5) the matching hypothesis: ‘tendency of individuals to create ties with similar others’ bias. These models are also often mentioned in gender equality debates in higher education in European countries. For example, the lack of women in senior positions in science is called the ‘leaky pipeline’ (Weber 2008). The number of women leaving the academic profession still constitutes an unnecessary waste of talent which may have quite negative implications for the knowledge economy.

As the second part of task 4.1 we studied the impact of gender both on publication productivity and on patterns of scientific collaborations in social sciences in Turkey. The research was based on bibliographic data on national level publications. The findings suggest that:
1. there are gender differences at publication productivity, participation, presence and contribution;
2. there are significant regularities exhibited by co-author pairs based on each partner author’s publication productivity;
3. regularities are different for inter‐gender and intra-gender co‐authorships. This study contributes to literature by exemplifying an integrated approach to better examine the role of gender in scientific collaboration.


Task 4.2 Design questions for the survey (WP1) and the qualitative interviews (WP1)
In the questionnaire we added a question concerning gender bias in peer review. In the qualitative interviews we deliberately did not ask explicitly for gender differences or gender effects. As gender in general is quite a sensitive and highly debated topic, we wanted to see if (both male and female) respondents would bring up the gender issues by themselves. We also paid attention to reported job changes in which the interviewees talk about the importance of family circumstances.

Task 4.3 Integration of data from the other work packages and the shared ACUMEN data set
In this WP we analysed the gender dimension in relation to the system of career evaluation and performance measurement by using used three datasets.

Data consisted of:
(1) Sample of men and women scientists and scholars (N=1994). We collected a set of papers from academic researchers included in the common dataset, to conduct a bibliometric analysis. The ACUMEN partners from WP2 and WP3 already provided part of the set of papers. WP4 completed the set by using the “Large scale author name disambiguation using rule-based scoring and clustering” algorithm developed at CWTS to detect publications per researcher. The algorithm used the email information for each researcher to retrieve the publications. Overall the final dataset contains in total 1994 researchers, 560 females and 1434 males.

(2) Questionnaire study about peer review practices. We used the Peer Review Practices (from WP1; N=2114)) dataset to analyse gender differences with regard to researchers’ attitudes towards the ways in which research quality, success, excellence and impact of scientific production are measured and evaluated. Furthermore, we aimed to get suggestions how the current peer review system should be improved or modified.

(3) Interviews with scientists about their career development. We analysed 18 interviews (from WP1) with academic individuals, employed in four different countries (Germany, Poland, The Netherlands and United Kingdom), about their personal experiences.

Task 4.4 Statistical analysis of the effects of evaluation on research careers of men compared with women & 4.5 Analysis of the potential effects of a select number of performance indicators on the careers of women compared with men.

Gender & Bibliometric Indicators
Since academic publishing is still very important for career opportunities of both males and females in sciences we focused in the first part of the study on the gender dimension in bibliometrics, by comparing the oeuvres of female and male scientists. Our bibliometric research confirms the traditional gender pattern; men produce on average a higher number of publications compared to women, regardless of their academic position and research field. We also paid attention to authorship order, given that the first and sometimes also last author publications are at least as important as raw publication counts for hiring, promotion and tenure (Wren et al. 2007). Our results suggest that women are not evenly represented across authorship positions. In our sample women are overrepresented in the first authorship position, especially in the disciplines astronomy and public health. At each level on the career ladder, the papers in the oeuvres of female researchers consist of a higher percentage of first authorships compared to men. With regard to last authorship position, women in all four selected disciplines are significantly underrepresented in this prestigious position. Female associate professors are significantly underrepresented. As last authorship positions are mainly dedicated to full and associate professors we can’t elaborate on possible gender differences at these lower positions. Interestingly, we show no gender differences regarding research impact in each studied disciplines and positions in academia, as measured by three impact indicators (MCS, MNCS, and PPtop10%). Our results show that depending on the discipline the degree of collaboration in general (inter-institutional) and internationally specifically varies. Interestingly, at the level of full professors, the percentage of collaboration is higher compared to males who have the same position in academia. At lower rank, the percentage of international collaboration is always lower for female researchers than for male researchers.

Gender & Peer review
The questionnaire study, among 2114 scientists affiliated in 66 countries, showed that both women and men view gender bias in peer review as non-urgent, compared to the scores on different types of bias. The interviews study show only one female scientist who experienced gender bias in applications. These results suggest that gender bias is not perceived, at least not at this moment, as a main concern in peer review processes.

Gender & Social Indicators
From the survey on peer review practices it can be concluded that the new generation of researchers gives higher rates to social (relevance) indicators compared to the older generation. Women do this even more than men. Gender differences in rating social indicators are most prominent in the postdoc career phase; female postdocs give higher values to these indicators compared to males in the same career phase. As research evaluation systems have only recently started to include societal impact as one of their criteria (van der Weijden, Verbree & Van den Besselaar 2012), the incentives for scientists to focus on societal impact are still relatively weak.

Gender and Job changes
In the qualitative interviews attention was paid to reported job changes in which the interviewees talk about the importance of family circumstances. This relates to the partners’ careers; the importance to geographically follow the partner in order to combine two careers. Our interviews showed that both male and female academic researchers (5 out of 18) mention their partners’ careers as a reason in making decisions regarding job mobility.

4.6 Design of guidelines and criteria on the gender dimension for the ACUMEN Portfolio
Academic Age

It is unfair to directly compare ACUMEN portfolios irrespective of gender, because results can be misleading. An academic who has taken some year off in order to raise children and/or who has worked part-time should not be disadvantaged for this. Therefore, the calculation of the academic age of researchers is included in the ACUMEN Portfolio. To compensate for gender, one year is subtracted for each child born after the PhD defence for which the academic is the single main responsible person. This allowance can be shared between carers, if agreed.

Academic Age = Number of full - time working years since PhD defence – number of children raised - special allowances.

Our interview studied showed that there are large differences among countries regarding maternity leave, childcare facilities and possibilities to work part-time in academia. We realize that the academic age calculation can’t compensate for all these country-specific facilities. Both evaluators and researchers should take this into account in comparing academics from different countries on their ACUMEN portfolios.

Contribution to society
As the ACUMEN project includes other important tasks of academic such as contributions to the society, we recommend researchers to list the magazine or newspaper articles, encyclopaedia articles and popular books & articles that they have written. In addition, to show the influence of ones scientific work on
society, in the ACUMEN portfolio examples can be included of societal impact: (1) specialist advices given outside academia; (2) professional practices that have used ones subject expertise; (3) laws, regulations and/or guidelines that are initiated, developed or amended on by ones research.

Collaboration
As collaboration is one of the main drivers of research output and scientific impact (Larivière et al 2013), we recommend the development and promotion of programs for female early career researchers. To increase international collaboration opportunities, female scientists should search for the support of an international mentor. In the mentor-mentee conversations, female mentees should also be trained to improve their personal and managerial skills such as negotiation, self-promoting and networking, because these qualities are necessary in discussions about authorship order (West et al 2011). In this way mentorship could contribute to speed up the process of closing the gender gap in science. Female full professors could act as a role model mentor for female early career scientists as there are some expectations in the literature that underrepresented groups are better served with mentors or role models with similar life experiences (Kopia, Melkers & Tanyildiz 2009).

References
Kopia, A., Melkers, J. and Tanyldiz, Z.E. (2009). Women in academic science: mentors and career development. Book chapter in: Women in science & technology. Edited by Prpic, K., Liveira, L and S. Hemlin. Institute for Social Research, Zagreb. Sociology of Science and Technology Network of the European Sociological Association.
Larivière, V., Ni, C., Gingras, Y., Cronin, B. & Sugimoto, C.R. (2013). Global gender disparities in science. Nature, 504, p. 211-213.
Mahbuba, D. & Rousseau, R. (2011). Matthew, Matilda and the others. In proceedings of the 7th International Conference of Webometrics, Infometrics and Scientometrics & 12 COLLNET Meeting, 20-23 September 2011, Istanbul Bilgi University, Instanbul.
Merton, R.K. (1968). The Matthew effect in science. Science, 159, p. 56-63.
Van Arensbergen, P. (2014). The selection of talent in academia. PhD thesis. The Hague: Rathenau Instituut. Forthcoming.
Van der Weijden, Verbree & Van den Besselaar (2012). From bench to bedside: the societal orientation of research leaders: The case of biomedical and health research in the Netherlands. Science and Public Policy. 39(3), p. 285-303.
Weber, R. (2008). The academic career: a daily adventure? Work-Life Balance and gender segregation in German higher education and research. http://www.gew.de/Binaries/Binary54968/091117%20WLB%20academic%20career%20Regina%20Weber.pdf .

West, M.S. & Curtis, J.W. (2006). AAUP Faculty Gender Equity Indicators 2006, Technical Report. American Association of University Professors. http://www.aaup.org/NR/rdonlyres/63396944-44BE-4ABA-9815-5792D93856F1/0/AAUPGenderEquityIndicators2006.pdf.
West, J.D. Jacquet, J., King, M.M. Correll, S.J. and Bergstrom, C.T. (2013). The role of gender in scholarly authorship. PloS ONE, DOI: 10.1371/journal.pone.0066212.
Wren, J.D. Kozak, K.Z. Johnson, S.J. Deakyne, L.M. Schilling, L.M. and Dellavalle, R.P. (2007). The write position. A survey of perceived contributions to papers based on byline position and number of authors. EMBO report. November (8)11: 988-991. Doi: 10.1038/sj.embor.7401095.


Publications WP4:

Bar-Ilan, J. and Van der Weijden, I. Altmetric gender bias? – Preliminary results. Submitted as work in progress paper to the STI 2014 conference. Event date: September 2014

Kretschmer, Hildrun, Ramesh Kundra, Donald Beaver, and Theo Kretschmer. 2012. "Gender bias in journals of gender studies." Scientometrics, 1-16.

Kretschmer, Hildrun, Alexander Pudovkin, and Johannes Stegmann. 2012. "Research evaluation. Part II: gender effects of evaluation: are men more productive and more cited than women?" Scientometrics, 1-14.

Pudovkin, Alexander, Hildrun Kretschmer, Johannes Stegmann, and Eugene Garfield. 2012. "Research evaluation. Part I: productivity and citedness of a German medical research institution." Scientometrics, 1-14.

Van der Weijden, I. & Calero Medina, C. Gender, Academic position and scientific publishing: a bibliometric analysis of the oeuvres of researchers. Submitted as short paper for the STI 2014 conference. Event date: September 2014

Van der Weijden, I., Zahedi, Z., Must, U. & Meijer, I. Gender differences in societal orientation and output of individual scientists. Submitted as short paper for the STI 2014 conference. Event date: September 2014


WP5 – New Bibliometric Indicators

This work package has investigated to what extent bibliometric indicators can be used in the evaluation of individual researchers. WP5 has analysed a wide range of bibliometric indicators such as indicators of production, citations, production & citations, production adjusted for time, production adjusted for field and several measures that describe different s aspects of a researcher’s publishing portfolio as a whole. WP5 has also assessed the need for the creation of new bibliometric indicators for the assessment of individuals and discussed ethical aspects. In addition the work package has also carried out a study of the feasibility of predicting later star given early citation data. A main result of WP5 is the recommendation of a set of bibliometric indicators the researchers can use for self-assessment and which can be included in the ACUMEN portfolio along with indicators from other work packages. The indicators have been tested empirically on samples drawn from the joint ACUMEN dataset.

Task 5.1 Literature review
The focus of the review was to create an overview of the author-level bibliometric indicators. This was in preparation for Task 5.2 “the development of novel indicators”. In the review we judged the utility of 108 indicators for researchers on a five point scale evaluating 1) the complexity of the calculation and 2) the complexity of the data collection process. Our primary view is that the indicators should be implementable by the individual researcher and that he or she should be able to describe the results of applying the bibliometric evaluation in a short narrative on their curriculum vitae. Therefore indicators are viewed as a form of self-evaluation, useful to document scientific activities and publication performance. The indicators are categorised according to the following dimensions: output, outcome, quality, research infrastructure, impact, innovation and social benefits, and sustainability. These were presented in overview tables to exemplify how this range of scientific activities can be collectively assessed and the advantages and limitations of each indicator were presented. This structure was chosen to emphasize that at the current time 1) certain scientific activities and publication performance are more easily evaluated using bibliometrics than others, 2) assessment of scientific activity and publication performance cannot be represented by a single indicator, 3) it is unwise to use citations by definition as a proxy of research quality, 4) choice of indicators can have a direct positive or negative effect on the outcome of the evaluation of the individual and 5) the assessment can easily be biased towards for whom the results are for and by whom the assessment is conducted. The usability of indicators and the transparency of their mathematical composition are questioned. The dimensions of ‘quality’ that indicators can measure are discussed.

Conclusion:
There is no pressing need to develop new indicators for the measurement of the performance of individual researchers. A sufficiently large and diverse set of indicators are in use or have been proposed.

Publications covering Task 5.1
Wildgaard, L., Schneider, J.W. & Larsen, B. (2014) Bibliometric Self-Evaluation: A review of the characteristics of 108 author-level bibliometric indicators. Submitted to Scientometrics and under revision.

Task 5.2 Development of novel indicators
The main finding of the literature review (Task 5.1) is that there is currently no pressing need to develop new indicators. It is more important to understand the indicators already in existence as well as their usefulness for scientists in different fields and of different academic seniorities. Hence, we concluded that Task 2 “the development of novel bibliometric indicators” is unnecessary. Instead, we focused our efforts on recommending the best selection of current indicators. It required further analysis to understand which indicators are required and how these need to be combined to best express a researcher’s performance. Hence, Task 5.4 “recommendation of selected indicators” was extended to include a study of the performance of 108 different indicators identified in the review across different disciplines and career stages. It is clear that using a single indicator (e.g. the h-index) and interpreting the results out of context of the researcher’s field or seniority will result in distorted and useless information. Our study shows that by providing a strategy of indicators for self-assessment, as well as locally relevant performance benchmarks, the researcher will reach a better understanding of the achievements of their published works and perhaps be able to identify where this can be improved.

Task 5.3 Selection of two samples of researchers
The empirical testing of the identified bibliometric indicators involved two main datasets. In the first data set, careful sampling was needed to facilitate a study of whether it is possible to identify ‘later stars’ using bibliometric indicators. We chose the scientific field of Astrophysics, a research specialty included in one the four fields in the shared ACUMEN dataset. The goal of the sampling was to identify a number of ‘later stars’ with highly cited papers and a randomly selected control group of ‘normal’ researchers. A total of 29 later stars and 74 random authors were identified for the study after careful sampling.

The second and larger data set was drawn from the core data set of 2154 researchers identified in WP2. It was possible to identify curriculum vitae and publication lists for 793 out of the 2154 researchers across the four disciplines: Astronomy, Environmental Science, Philosophy and Public Health; approximately 200 researchers in each discipline. We wanted to compare indicator performance including the complexity of citation retrieval and computation of indicators from a structured citation database (Web of Science) with citation data retrieved from a web-crawler based index (Google Scholar). Publication and citation data were retrieved from Web of Science in July 2013. UT numbers were sent to CWTS as they kindly offered to calculate field benchmarks, crown indicators and other standard indicators we could use in comparisons. The Web of Science data set comprises 30,967 citeable papers, from 741 researchers.

Publication and citation data for each researcher was retrieved from Google Scholar resulting in 72,638 papers with citation statistics. The demographics of the researchers in the data set have been thoroughly analysed and presented as ACUMEN reports. Further, observations during the data collection have contributed to understanding the extent researchers used indicators themselves on their curriculum vitae and the effects bibliometric analyses can have on the individual researcher.

Conclusion from the data-collection:
As a user purposely looking for data on individual scholars, we experienced how difficult it is to gather a complete picture of the scholar, when information is separated into personal homepages, institute homepages, PDFs and various profile tools each with different “sell by dates”. The GEP must for example describe basic retrieval problems, especially name ambiguity, and how these affect the usefulness of citation indicators and ready-to-use metrics. Likewise, we cannot expect the researcher to sort through two or more citation indicators and remove duplicate citations to get a complete citation picture of their work. Further, the data collection showed how important personalization is. ACUMEN must encourage the researcher to explore different databases to understand their coverage in them and be critical of what the ready to use metrics reported in these sources represent. This must be made obvious to different types of users of the ACUMEN Portfolio as well.

Conclusions for the ACUMEN Portfolio and GEP:

1. We cannot expect the researcher to sort through two or more citation indicators and remove duplicate citations to get a complete citation record.
2. Name ambiguity problems need to be described in the portfolio including how these affect the usefulness of citation indicators and ready-to-use metrics.
3. Researchers should be encouraged to have an ORCID id or Google Citation profile to ensure the scholar can easily claim his publications.
4. The ACUMEN Portfolio needs to have easy tools to import publication data.
5. The guidelines needs to explain the calculation and interpretation of metrics, for all types of users of the Portfolio.
6. The portfolio must include a description of the problems with the representability of reference standards at the individual and specialty level.
7. Personalisation of the ACUMEN CV is likely to encourage use.
8. Ensure that scholars can link to their peers ACUMEN CVs, like LinkedIn.
9. A guide and examples of how to present indicators on the CV should be included.


Task 5.4a. Consequences of the use bibliometric indicators: from the researcher’s perspective and from the evaluator’s perspective

To reduce the chance of violating standard codes of scholarly conduct and behaviour in professional scientific research self-evaluation, both the calculation and the interpretation of the indicators must be transparent. But how does the implementation of author-level indicators affect the researcher’s self-image and the evaluator’s judgement? This task looked at the common psychological effects of bibliometric ‘ready to use’ indicators on the researcher and on the evaluator. When failures come to light, negativity can make the individual feel inadequate. If the quality of evaluation judgments based on standardised indicators is low, it may lead to assumptions about the productivity and citation impact of a researcher, which can be unsubstantiated. Given that the results of ‘ready to use’ bibliometric analyses are of personal significance to the individual, it is vital that the bibliometric community assesses if the usefulness of these types of author-level bibliometrics is limited by psychological factors.
Bibliometric analyses are of personal significance to the individual, and it is anticipated that the individual will seek and utilize whatever information is available that will increase their subjective validity. If the individual provides substantiating, consistent evidence that informs the CV, the more stable it is. When an evaluator is met with a sporadic CV that lacks continuity, the researcher may receive a poor rating. Likewise, if only partial and unreliable information are used to calculate the indicator, the less valid or more uncertain the self-evaluation is assumed to be. Knowing what data are and are not included in indicators can reduce misinterpretation that could cause fabricated self-images and damaged reputations. Accordingly, self-image is the core concept of a CV as the CV is a proxy document for the researcher and is as such a space for researchers to promote their self-image. Further study is needed to understand the extent to which bibliometric indicators are used “behind-the-scenes” in tenure or promotion and how they are weighted against other assessment criteria to fully understand their effect on researchers’ behaviour. The psychological effects of ‘ready to use’ indicators can be addressed, though not solved, by promoting knowledge and understanding of the challenges and limitations associated with measuring author impact. As part of this task we developed a Behavioural Codex for researchers using bibliometric self-evaluation.

Publications covering Task 5.4a
Wildgaard, L (2014) The effects of ‘ready to use’ bibliometric indicators. Short paper submitted to STI conference 2014. Event date: September 2014

Task 5.4b. Consequences of the use bibliometric indicators: from the analysis of data collected in Google Scholar
In this sub-task we investigated the effect of the database used to collect the publication and citation data on the outcome of a bibliometric evaluation. We also analysed the effect of academic career stage and discipline. This investigation is extended in the supplement to Task 5.4c. We used ready-to-use indicators from Publish or Perish: Total number of papers (P), years since first publication (PY), total number of citations (C), cites per paper (CPP) and the average number of citations per paper normalized for years since first publication (CPAY). Indicators often defined as indicators of “quality”: h-index (h), g-index (g), e-index (e) and age-weighted index (AW). With this information the scholar can easily calculate the m-quotient (m) and the mg-quotient. See associated deliverables for descriptions and definitions of each indicator.

Women make up 22% of the overall sample reflecting the European ratio of men to women in science, 3:1. The size and content of the seniority categories were not homogenous. The spread of publication and citation data within categories and across fields was highly skewed and it was difficult to estimate effects of indicators and detect homogeneity, which is important if we wish to establish performance benchmarks. We used quartiles to illustrate the spread of the data and the median or second quartile as the best estimate of average performance within group. The relative interquartile range (RIQR) was calculated. The variation in the number of publications a scholar produces, within each seniority, in Astronomy, Environmental Studies and Philosophy was still very large, but in Public Health there was less variation. To understand if we need to recommend gender specific indicators, we studied the career trajectory of scholars in our sample. Women do not appear to need a higher number of publication years to advance. We then compared the performance of female scholars to male scholars within seniority using the other indicators in this study. The performance of each indicator was highly individual and no gender-specific patterns were identified. PhD students don’t have enough citation and publication data or years of experience to use classic bibliometric indicators.
All Scholars were ranked per seniority in descending order for each indicator. The tables were divided into lower and upper quartiles. Each scholar’s placement in the rankings of each indicator was mapped manually and categorized as high, middle or low. This resulted in the identification of two groups of indicators. The first group showed predictive relations where a high, middle or low score on one indicator predicted a high, middle or low score on another. The top 25%, middle 50% or bottom 25% scholars remained the same but ranked in a different order. The second indicator group consisted of “unpredictive” indicators. For example, a low number of publications did not result in a high citation score. No individual or seniority patterns were found across this sub-group of indicators, and ranking resulted in different scholars appearing in the top, middle or bottom quartiles. When we compared citations per paper to their rank position, we found the ratios within seniorities fit for the whole group, which in our dataset is a proxy for the disciplinary level. The expected performance of scholars according to their seniority varies by discipline.

Publications covering Task 5.4b
Wildgaard, L. (2014). Just Pimping the CV? The Feasibility of Ready-to-use Bibliometric Indicators to Enrich Curriculum Vitae. In:_iConference 2014 Proceedings, p. 954 - 958. doi:10.9776/14326

Task 5.4c. Consequences of the use bibliometric indicators: from the analysis of data collected in Web of Science
Building on Task 5.4b we investigated how 52 of the 108 indicators identified in task 5.1 perform on data from Web of Science across our disciplines and career stages. The remainder of the 108 indicators were deemed to be too complex for individuals researchers to calculate – because access to special proprietary datasets are needed or because of the complexity of the calculation required to compute these indicators.

We used clustering as a method to recommend single indicators that represent independent aspects of research performance. The clustering identified central and isolated indicators for each discipline. To investigate the role of the identified central indicators, we ranked authors within disciplines and mapped how their position in the ranks changes when using the central indicators as the control. We identified the top 10%, top 25%, middle 50% and bottom 25% researchers in each set and found that certain indicators appear to control rank position. These central indicators differed from discipline to discipline. Across all disciplines we observed the same trend. If a researcher is placed in the top 10% of the sample ranking by the central indicator, the researcher is placed in the top 10% using the other indicators the central indicator has strong links to. The same holds for authors in the top 25%, middle 50% and bottom 25%. We also noticed that isolated indicators have no strong links to other indicators and produce a very random rank positions. However, they do indicate activities that are not covered by the other indicators.

These observations need to be explored and deepened in further statistical analyses that investigate the overlap between the central indicators and the indicators they link to as well as the aspects of the effect of an authors’ production they capture. Using a hierarchical clustering model that illustrated how closely related the indicators are to each other, we discovered that indicators group together in descriptors of production, citations, production & citations, production adjusted for time, production adjusted for field and miscellaneous measures that describe the more subjective aspects of a researcher’s publishing portfolio. The clustering of indicators is different from discipline to discipline, as is the strength of their relation. If we were to recommend a performance indicator for each field, for each type of indicator of activity, we would need to investigate the role of the indicators within their cluster: what they measure, if they overlap, how complicated they are and which of them are redundant.

Regarding bibliometric prediction of later stars in astrophysics two indicators of total influence based on citation numbers normalised with expected citation numbers are the only indicators which show differences between later stars and random authors significant on a 1% level. Indicators of paper output are not very useful to predict later stars. The famous Hirsch Index makes no difference at all between later stars and the random control group.

Publications covering Task 5.4c
Wildgaard, L & Larsen B (2014) Cluster analysis of bibliometric indicators of individual scientific performance. Submitted as short paper to STI conference, March 2014.
Havemann, F. & Larsen, B. (2014) Bibliometric Indicators of Young Authors in Astrophysics: Can Later Stars be Predicted? Submitted to Scientometrics.



WP6 - Portfolio

Summary
The research strategy for ACUMEN was operationalized with research conducted at two levels: the core research about different aspects of research evaluation and the integration of this research in the development of the ACUMEN Portfolio concept. It was the role of the portfolio work package (WP6) to facilitate coordination among distributed research tasks with an explicit focus on integration of the different perspectives on the evaluation of scientific careers. Although these two research objectives were enacted simultaneously, the central focus of the work package evolved as the research progressed. Deliverables in the portfolio work package included D6.9 Expert workshop proceedings (Brussels) and D6.14 Portfolio guidelines, as well as Milestone 4 – the Portfolio course.

Execution of the portfolio work package was organized in three stages, aligned with the progression of research for the project a whole. The first stage of WP6 was focused on coordination among work packages conducting the core research. The second stage focused on conceptualization of the portfolio on the basis of core research. And the third phase incorporated changes to the collaboration structure and shifted focus to iterative design and development of the portfolio design and guidelines document. In addition, WP6 was expanded to incorporate recommendations from the external review: to investigate emerging evaluation practices associated with contemporary CRIS systems and to engage with the euroCRIS community regarding common interests and concerns.

Coordination of Core Research
During the beginning stage of the project, the focus of WP6 was to monitor integration concerns associated with the core research conduced in work packages 1-5 and to ensure compatibility among the different evaluation perspectives. This was accomplished through linkages to the Milestone MS1 for data integration, and Milestone MS2 for consideration of prospective portfolio components.

During the consortium meeting in Tallinn (January 2012), the partners developed a portfolio framework with proposed category for portfolio content. At that time, the structure of the portfolio included four categories of researcher achievements, accompanied by a narrative. Category one consisted of the researchers’ Curriculum Vitae, comprising components like skills, expertise and management experience, among others. Category two, ‘publications and peer review’ was a list output in different areas, such as scholarly publications and public performances. Categories three and four were measures of ‘influence’ on the basis of, bibliometric, webometric and altmetric analyses (ref MS2).


Conceptualizing the Portfolio
The aim of the ACUMEN Portfolio is to provide researchers with a selection of evaluation tools and indicators, from which researchers and evaluation committees can select which tools and indicators to use on the basis of the relevant community of practice (e.g. field, discipline, and/or professional society) and the specific area of research. Constructing the portfolio content is intended to facilitate a shift in the evaluation dynamic from a strictly top-down event to one in which the researcher has a stronger voice in the evaluation process. In addition to participating in the selection of evaluation indicators and content, the researcher can utilize the narrative component to make evidence-based arguments about the relevance and impact of her research.

A two-part workshop was organized to build on work at the Tallinn meeting and further stimulate the development of the portfolio. The first part (January 2013, Madrid) of the workshop focused on further development of portfolio concept and the second part (March 2013, Copenhagen) was focused on operationalizing the portfolio concept as a web service taking into account the relevant contextual factors. Of particular concern in both sessions was identification of stakeholders and the evaluation scenarios associated with use of a portfolio for individual researchers.

The workshops in Madrid and Copenhagen helped reveal weak points in the evolving portfolio concept. In the wrap up of the two workshops, the consortium partners formulated a list of overarching questions that framed complex issues that were key to further progress on the portfolio design. Resolving these issues was a complicated task both by the wide range of issues that the portfolio is meant to address and by the diversity of disciplinary and evaluation contexts to be considered.

Overarching Questions:

• To what extent do we want to help people with their international career (by finding a solution for national differences in evaluation methods)?
• How will we deal with the risk that evaluators will focus on the easy numbers and leave the other indicators out when evaluating? (question from evaluators)
• Is the portfolio still useful for people who are leaving science?
• What are the privacy consequences of completing the portfolio?
• Does ACUMEN aim to serve non-academic researchers? (question from evaluators)
• Will the Portfolio be used in addition to CV’s and publications lists or replace them?
• How do we measure international experience in the portfolio?

As we moved into the final stage of the ACUMEN project, new collaboration teams were created not only as a planned shift in focus from coordination among work packages to integration of work package outcomes, but also to address the overarching questions formulated in the workshops (see the WP7-Management report for details about the collaboration team strategy).

Iterative Design and Development
As the objectives among the new collaboration teams were interrelated, a series of three deployment/testing events was used to coordinate scheduled information exchange among the teams and to facilitate iterative design and development of the portfolio. The event series began with core portfolio concept followed by increased complexity with each successive release. The workshops followed a progression from basic concept (Utrecht), to user testing (Madrid), to stakeholder evaluation (Brussels).

Utrecht - The first test was focused on testing the basic concept of the ACUMEN portfolio at the “Crafting Your Career” event, held in Utrecht, 30 October 2013. The Rathenau Institute and CWTS-Leiden University organized this event. The target audience for the event was early- to mid-career researchers (last year PhD, post-doc) from a wide representation of fields. The testing protocol focused on three points of interaction with the workshop participants: 1) participants submit a current CV during registration for the event, 2) feedback from ACUMEN researchers on the basis of the portfolio concept, 3) dissemination of an informational brochure. The concept was in general received well and participant feedback was useful in further development of the portfolio (for a full report, see Milestone-MS4 Portfolio course).

Madrid - The second test was focused on testing the specific content of an ACUMEN portfolio and was hosted by CSIC, ACUMEN partner in Madrid, 13 December 2013. The target audience for the event was again early-career researchers (PhDs and post-docs) from a wide representation of fields. The test was geared towards gaining conceptual feedback on the contents of the portfolio, rather than aiming for on-the-spot filling out of details. Participants were asked to engage in a fictive job application for their next dream job, to be able to assess how useful the portfolio is in that case. The posters and brochures created for the Utrecht event were reviewed and then brought to Madrid for dissemination among participants. Detailed user feedback again helped refine the portfolio and guidelines document (for a full report, see Deliverable D7.10 Proceedings graduate school demo).

Brussels - Building on feedback from testing in Utrecht and Madrid, The final deployment workshop was aimed at testing the portfolio from an evaluator’s perspective in Brussels, 24 January 2014. While the ACUMEN portfolio is specifically concerned with, and hopes to empower, the researcher, evaluators’ perspectives were seen as crucial to acceptance and adoption of the portfolio concept. The target audience for this event was evaluators and evaluation officers, most of whom were senior researchers themselves. Participants represented a diversity of fields and different backgrounds. The program began with plenary presentations on the ACUMEN project, after which participants were engaged in parallel focus groups, each with a specific focus. Two of the focus groups used use case scenarios to elicit discussion and feedback. The third focus group addressed the Portfolio’s technological feasibility, usability in different scenarios and applicability in the context of academic careers. The overall aims of the ACUMEN project and the portfolio were well received. Participants were engaged and provided productive feedback (for a full report, see Deliverable D6.9 - Expert workshop proceedings).

Guidelines for Good Evaluation Practices

The main result of ACUMEN is the “Guidelines to Good Evaluation Practices with the ACUMEN Portfolio”, which was presented at the workshop in Brussels and is available online, together with example portfolios, at http://research-acumen.eu/portfolio. The ACUMEN Portfolio is a way for Portfolio owners to highlight their achievements and to present themselves in the most positive way. It supplements the traditional CV because it highlights key achievements rather than giving an exhaustive list. It contains a systematic set of types of information related to three aspects of an academic's career:
1) Expertise – methods, areas of theory, etc.
2) Outputs – publications, patents, etc.
3) Impacts – citations, honours, etc.

The ACUMEN Portfolio also contains a narrative that the academic can use to explain their academic value, backed by evidence from the rest of the portfolio, when possible.

The guidelines document is primarily for use by evaluators who are intending to use the ACUMEN Portfolio to aid in decision-making, such as for funding, promotion or appointments. Nevertheless, it can also be used by individual academics seeking to create a Portfolio for self-evaluation purposes or to supplement their CV, to understand the portfolio concept or to ensure that their portfolio is as effective as possible.

The ACUMEN Portfolio distinguishes itself from a traditional academic CV in three aspects:
1) The ACUMEN Portfolio has an explicit focus on demonstrating specific types of achievements and skills rather than listing all achievements and activities. This makes it easier for evaluators to compare people based upon their Portfolios and to identify specific kinds of skills or expertise needed.
2) The ACUMEN Portfolio incorporates an age factor to allow for a fairer comparison of academics at different stages of their career and to compensate for gender and disability inequalities that may otherwise be hidden.
3) The ACUMEN Portfolio includes an evidence-based narrative that allows the researchers to tell their own story in their own way, but tying it to evidence.

The indicators in the ACUMEN portfolio include a combination of quantitative (e.g. based upon citation counts) and qualitative (e.g. list of awards, list of invited keynote talks). Each ACUMEN Portfolio indicator is designed to give evidence of a desirable academic attribute and a portfolio of indicators is designed to give a rounded impression of the contributions of an individual academic. Nevertheless, each individual indicator, and particularly the quantitative ones, has limitations and is only able to partially reflect that which it is designed to cover. This is most apparent when there is a range of similar indicators and, for practical reasons, only one has been selected. For instance, citation counts could be calculated with or without self-citations and across different databases. The most important consequence is that all of the indicators should be used to inform rather than replace human judgement. For example, if candidate A has higher or better values on all indicators than candidate B then whilst candidate A is probably better, the narrative should still be read and judgement should still be used to decide whether this is definitely the case. For example, the narrative might state that the academic has capabilities that are not well covered by the indicators in the portfolio because they are unusual but that are nevertheless valuable.

Publications:

In addition to the deliverables (Guidelines and the Portfolio itself), a scholarly publication authored by all ACUMEN researchers will be submitted soon after the final completion of the ACUMEN project:
Wouters et al., “The ACUMEN Portfolio: a new approach to individual level performance assessment”, to be submitted June 2014.  
Potential Impact:
d. Potential impact and main dissemination activities

The ACUMEN project has been very active in disseminating its results through the publication of a host of technical and social science journal articles, conference presentations, as well as articles in professional and newspapers and blogs (see Part 2 for the complete publication and presentation list). The philosophy underlying the ACUMEN dissemination strategy was that the communication should be a two-way process. The responses to the concept of the ACUMEN portfolio have systematically been processed in the design of the portfolio. The three tests of the final version of the ACUMEN portfolio in Utrecht, Madrid, and Brussels brought these discussions together and engaged a larger group of stakeholders. The outcome of these workshops indicate that the ACUMEN Portfolio has great potential in the fast evolving landscape of research evaluation and assessment and the stronger focus at European universities and research institutes to human resource management. The dissemination activities of the ACUMEN project were aimed at establishing a niche for the ACUMEN Portfolio as a set of principles and standards. These can subsequently be used by both academic parties as well as for-profit companies to develop more tools for researchers to present their activities and performance to their institutes as well as to the public at large.

EuroCRIS and Emerging Evaluation Practices

A key development that will shape the future impact of the ACUMEN Portfolio and the Guidelines to Good Evaluation Practices with the ACUMEN Portfolio is the emergence of comprehensive research information systems. This was an important aspect of the external reviews of the ACUMEN Portfolio in which it was recommended to actively engage with among others the EuroCRIS community. On the basis of these recommendations, we developed an action plan with three levels of engagement with the euroCRIS community: a) identify common interests and concerns, b) identify emerging practices related to evaluation of individuals, and c) develop links among relevant stakeholders within the community.

Background
Assessment of research outcomes is an important resource for strategic management of future research at institutions such as universities. Current Research Information Systems (CRIS) are typically used for conducting internal assessment of faculties, departments, and research institutes, and can also be used for benchmarking among external institutions or groups with in an institution. An increasingly common feature among CRIS systems is the CV module, which allows presentation of individual researcher profiles, for presentation internally within an institution's intranet or externally on the Web. The same information used to produce researcher profiles can also be used for assessment of individual researchers. As such, it is important to consider the ACUMEN portfolio design from the standpoint of evaluation practices associated with CRIS systems.

Engaging euroCRIS
Several questions guided the investigation of evaluation practices associated with CRIS systems: In what ways are researchers represented in CRIS systems; how is data input and validated; what kinds of data are included/excluded; in what ways are individual researchers evaluated on the basis of CRIS data; what kind of transparency mechanisms are in place to ensure how these systems are used?

The first phase of this research involved in investigation of and engagement with the euroCRIS community. An ACUMEN partner attended the euroCRIS membership meeting in Bonn, 13-14 May 2013 and the euroCRIS annual Strategy Seminar in Brussels, 9-10 September 2013. Both meetings involved presentations of CRIS use-cases from institutions round Europe and task group meetings associated with best practices, open access repositories, and development of bibliometric indicators. We presented the ACUMEN concept at the euroCRIS Membership meeting in Porto, 14-15 November 2013. Our presentation included an invitation to the euroCRIS community to work together on common interests.

The second phase involved semi-structured interviews among members of the euroCRIS community. The interviews were conducted in the Nordic region, where there is extensive use of CRIS systems for research assessment. Although CRIS systems used in universities are not typically oriented toward the evaluation of individuals, these systems do seem to play an important role in shaping research evaluation practices more generally. Interviews were aimed at evaluation use-cases, data input scenarios, and data interoperability issues.

Outcomes
Two kinds of outcomes from the euroCRIS project contributed to the ACUMEN project. First, the euroCRIS case study was conducted concurrently with the portfolio design. Insights about CRIS systems, and the ways in which they are being used, were fed back into the portfolio design process. Information about the CERIF data model and CASRAI standard dictionaries, for example, was provided to the collaboration teams to enhance understanding of the technical and standards environment associated with CRIS systems. Second, on the basis of our presentation to the euroCRIS community, ACUMEN was invited to participate in the euroCRIS task group for development of indicators. Following from this, several euroCRIS members participated in the ACUMEN stakeholder workshop in Brussels where they provided valuable suggestions.

Discussions about the relationship between ACUMEN and the euroCRIS community, initiated at the Brussels workshop, have evolved into ideas about implementation of the ACUMEN portfolio in the framework of the CERIF data model. This effort is beyond the scope of ACUMEN, but there appears to be sufficient interest for developing a follow on project to implement the portfolio. Members of ACUMEN are working together with euroCRIS (technical and standards implementation) and ARMA (international coordination of research evaluation policies) to explore the feasibility of a follow on project proposal. The implementation framework will be presented at the next euroCRIS members meeting in Amsterdam, 11 September 2014.

Evolving standards for performance assessment and indicators

The completion of the ACUMEN Portfolio coincides with a surge of interest in more advanced forms of evaluation and indicators at the level of the individual researcher and author. In the bibliometrics community, a number of initiatives have put this high on the agenda at their conferences in 2013 and 2014 (ISSI 2013, 15-18 July 2013, Vienna ; ENID/STI 2013 4-6 September 2013, Berlin ; ENID/STI 2014, 3-6 September 2014, Leiden ). The OECD organized a stakeholders workshop on standards for performance indicators on March 25, 2014, to which the main commercial database providers also contributed. A follow-up workshop will be organized by the Observatoire des Sciences et Techniques (Paris) in collaboration with CWTS, SPRU, NIFU, INGENIO, and OcyT on May 12, 2014 in Paris. In addition, a panel proposal has been submitted to the AAAS meeting in February 2015 (San Jose, US). We expect that this development will be strongly pursued by the main players in the field of research evaluation as well as in the metrics community.

The ACUMEN Portfolio ties into this development, because it has created an overall framework for the presentation of activities, expertise and influence of the individual researcher. This framework enables the effective selection and interpretation of evaluation data as well as indicators. This is intended to help prevent drift of the development of performance standards on the basis of either the databases available or the indicators that are most easily computable. The narrative, which serves as a core component of the ACUMEN Portfolio, may become an “obligatory point of passage”, to quote the science philosopher Bruno Latour, for the translation of performance metrics into qualitative judgment and quality control. The fact that the ACUMEN Portfolio is designed within the interpretation of evaluation as a communication process will only make this more attractive to a larger audience in and around the sciences.

List of Websites:
Project Website - http://research-acumen.eu

Contact:
Paul Wouters, Professor of Scientometrics
Director Centre for Science and Technology Studies
Leiden University, the Netherlands
+31 71 5273909 (secr.)