Changing How We Evaluate Research Is Difficult – but Not Impossible

Changing How We Evaluate Research Is Difficult – but Not Impossible

on March 2, 2021

Declarations can inspire revolutionary change, but the high ideals inspiring the revolution must be harnessed to clear guidance and tangible goals to drive effective reform. When the San Francisco Declaration on Research Assessment (DORA) was published in 2013, it catalogued the problems caused by the use of journal-based indicators to evaluate the performance of individual researchers, and provided 18 recommendations to improve such evaluations. Since then, DORA has inspired many in the academic community to challenge long-standing research assessment practices, and over 150 universities and research institutions have signed the declaration and committed to reform.

But experience has taught us that this is not enough to change how research is assessed. Given the scale and complexity of the task, additional measures are called for. We have to support institutions in developing the processes and resources needed to implement responsible research assessment practices. That is why DORA has transformed itself from a website collecting signatures to a broader campaigning initiative that can provide practical guidance. This will help institutions to seize the opportunities created by the momentum now building across the research community to reshape how we evaluate research.

Systemic change requires fundamental shifts in policies, processes and power structures, as well as in deeply held norms and values. Those hoping to drive such change need to understand all the stakeholders in the system: in particular, how do they interact with and depend on each other, and how do they respond to internal and external pressures? To this end DORA and the Howard Hughes Medical Institute (HHMI) convened a meeting in October 2019 that brought together researchers, university administrators, librarians, funders, scientific societies, non-profits and other stakeholders to discuss these questions. Those taking part in the meeting ( discussed emerging policies and practices in research assessment, and how they could be aligned with the academic missions of different institutions.

The discussion helped to identify what institutional change could look like, to surface new ideas, and to formulate practical guidance for research institutions looking to embrace reform. This guidance – summarised below – provides a framework for action that consists of four broad goals: i) understand obstacles that prevent change; ii) experiment with different ideas and approaches at all levels; iii) create a shared vision for research assessment when reviewing and revising policies and practices; iv) communicate that vision on campus and externally to other research institutions.

Understand obstacles that prevent change

Most academic reward systems rely on proxy measures of quality to assess researchers. This is problematic when there is an over-reliance on these proxy measures, particularly so if aggregate measures are used that mask the variations between individuals and individual outputs. Journal-based metrics and the H-index, alongside qualitative notions of publisher prestige and institutional reputation, present obstacles to change that have become deeply entrenched in academic evaluation.

This has happened because such measures contain an appealing kernel of meaning (though the appeal only holds so long as one operates within the confines of the law of averages) and because they provide a convenient shortcut for busy evaluators. Additionally, the over-reliance on proxy measures that tend to be focused on research can discourage researchers from working on other activities that are also important to the mission of most research institutions, such as teaching, mentoring, and work that has societal impact.

The use of proxy measures also preserves biases against scholars who still feel the force of historical and geographical exclusion from the research community. Progress toward gender and race equality has been made in recent years, but the pace of change remains unacceptably slow. A recent study of basic science departments in US medical schools suggests that under current practices, a level of faculty diversity representative of the national population will not be achieved until 2080 (Gibbs et al., 2016).

Rethinking research assessment therefore means addressing the privilege that exists in academia, and taking proper account of how luck and opportunity can influence decision-making more than personal characteristics such as talent, skill and tenacity. As a community, we need to take a hard look – without averting our gaze from the prejudices that attend questions of race, gender, sexuality, or disability – at what we really mean when we talk about ‘success’ and ‘excellence’ if we are to find answers congruent with our highest aspirations.

This is by no means easy. Many external and internal pressures stand in the way of meaningful change. For example, institutions have to wrestle with university rankings as part of research assessment reform, because stepping away from the surrogate, selective, and incomplete ‘measures’ of performance totted up by rankers poses a reputational threat. Grant funding, which is commonly seen as an essential signal of researcher success, is clearly crucial for many universities and research institutions: however, an overemphasis on grants in decisions about hiring, promotion and tenure incentivises researchers to discount other important parts of their job. The huge mental health burden of hyper-competition is also a problem that can no longer be ignored (Wellcome, 2020a).

Experiment with different ideas and approaches at all levels

Culture change is often driven by the collective force of individual actions. These actions take many forms, but spring from a common desire to champion responsible research assessment practices. At the DORA/HHMI meeting Needhi Bhalla (University of California, Santa Cruz) advocated strategies that have been proven to increase equity in faculty hiring – including the use of diversity statements to assess whether a candidate is aligned with the department’s equity mission – as part of a more holistic approach to researcher evaluation (Bhalla, 2019). She also described how broadening the scope of desirable research interests in the job descriptions for faculty positions in chemistry at the University of Michigan resulted in a two-fold increase of applicants from underrepresented groups (Stewart and Valian, 2018). As a further step, Bhalla’s department now includes untenured assistant professors in tenure decisions: this provides such faculty with insights into the tenure process.

The actions of individual researchers, however exemplary, are dependent on career stage and position: commonly, those with more authority have more influence. As chair of the cell biology department at the University of Texas Southwestern Medical Center, Sandra Schmid used her position to revise their hiring procedure to focus on key research contributions, rather than publication or grant metrics, and to explore how the applicant’s future plans might best be supported by the department. According to Schmid, the department’s job searches were given real breadth and depth by the use of Skype interviews (which enhanced the shortlisting process by allowing more candidates to be interviewed) and by designating faculty advocates from across the department for each candidate (Schmid, 2017). Another proposal for shifting the attention of evaluators from proxies to the content of an applicant’s papers and other contributions is to instruct applicants for grants and jobs to remove journal names from CVs and publication lists (Lobet, 2020).

The seeds planted by individual action must be encouraged to grow, so that discussions about research assessment can reach across the entire institution. This is rarely straightforward, given the size and organisational autonomy within modern universities, which is why some have set up working groups to review their research assessment policies and practices. At the Universitat Oberta de Catalunya (UOC) and Imperial College London, for example, the working groups produced action plans or recommendations that have been adopted by the university and are now being implemented (UOC, 2019Imperial College, 2020). University Medical Centre (UMC) Utrecht has gone a step further: in addition to revising its processes and criteria for promotion and for internal evaluation of research programmes (Benedictus et al., 2016), it is undertaking an in-depth evaluation of how the changes are impacting their researchers (see below).

To increase their chances of success these working groups need to ensure that women and other historically excluded groups have a voice. It is also important that the viewpoints of administrators, librarians, tenured and non-tenured faculty members, postdocs, and graduate students are all heard. This level of inclusion is important because when communities impacted by new practices are involved in their design, they are more likely to adopt them. But the more views there are around the table, the more difficult it can be to reach a consensus. Everyone brings their own frame-of-reference, their own ideas, and their own experiences. To help ensure that working groups do not become mired in minutiae, their objectives should be defined early in the process and should be simple, clear and realistic.

Create a shared vision

Aligning policies and practices with an institution’s mission

The re-examination of an institution’s policies and procedures can reveal the real priorities that may be glossed over in aspirational mission statements. Although the journal impact factor (JIF) is widely discredited as a tool for research assessment, more than 40% of research-intensive universities in the United States and Canada explicitly mention the JIF in review, promotion, and tenure documents (McKiernan et al., 2019). The number of institutions where the JIF is not mentioned in such documents, but is understood informally to be a performance criterion, is not known. A key task for working groups is therefore to review how well the institution’s values, as expressed in its mission statement, are embedded in its hiring, promotion, and tenure practices. Diversity, equity, and inclusion are increasingly advertised as core values, but work in these areas is still often lumped into the service category, which is the least recognised type of academic contribution when it comes to promotion and tenure (Schimanski and Alperin, 2018).

A complicating factor here is that while mission statements publicly signal organisational values, the commitments entailed by those statements are delivered by individuals, who are prone to unacknowledged biases, such as the perception gap between what people say they value and what they think others hold most dear. For example, when Meredith Niles and colleagues surveyed faculty at 55 institutions, they found that academics value readership most when selecting where to publish their work (Niles et al., 2019). But when asked how their peers decide to publish, a disconnect was revealed: most faculty members believe their colleagues make choices based on the prestige of the journal or publisher. Similar perception gaps are likely to be found when other performance proxies (such as grant funding and student satisfaction) are considered.

Bridging perception gaps requires courage and honesty within any institution – to break with the metrics game and create evaluation processes that are visibly infused with the organisation’s core values. To give one example, HHMI tries to advance basic biomedical research for the benefit of humanity by setting evaluation criteria that are focused on quality and impact. To increase transparency, these criteria are now published (HHMI, 2019). As one element of the review, HHMI asks investigators to “choose five of their most significant articles and provide a brief statement for each that describes the significance and impact of that contribution.” It is worth noting that both published and preprint articles can be included. This emphasis on a handful of papers helps focus the review evaluation on the quality and impact of the investigator’s work.

Arguably, universities face a stiffer challenge here. Institutions striving to improve their research assessment practices will likely be casting anxious looks at what their competitors are up to. However, one of the hopeful lessons from the October meeting is that less courage should be required – and progress should be faster – if institutions come together to collaborate and establish a shared vision for the reform of research evaluation.

Finding conceptual clarity

Conceptual clarity in hiring, promotion, and tenure policies is another area for institutions to examine when aligning practices with values (Hatch, 2019). Generic terms like ‘world-class’ or ‘excellent’ appear to provide standards for quality; however, they are so broad that they allow evaluators to apply their own definitions, creating room for bias. This is especially the case when, as is still likely, there is a lack of diversity in decision-making panels. The use of such descriptors can also perpetuate the Matthew Effect, a phenomenon in which resources accrue to those who are already well resourced. Moore et al., 2017 have critiqued the rhetoric of ‘excellence’ and propose instead focusing evaluation on more clearly defined concepts such as soundness and capacity-building. (See also Belcher and Palenberg, 2018 for a discussion of the many meanings of the words ‘outputs’, ‘outcomes’ and ‘impacts’ as applied to research in the field of international development).