Can performance review bias be fixed?
By exposing patterns of bias and redesigning performance reviews, one law firm is making employee evaluations fairer and more effective
The last year has seen an explosion of articles and advice about diversity, equity, and inclusion (“DEI”) and how to promote them in the workplace. Unfortunately, too many articles on this topic present generic advice, lack supporting data, and offer no concrete steps leaders can take to improve DEI initiative performance. A new article by Joan C. Williams, Denise Lewin Loyd, Mikayla Boginsky, and Frances Armas-Edwards, however, suffers none of those ills. The authors present a solid, data-backed, case study that is well worth analyzing.
The story begins two years ago when the authors were contacted by a U.S. law firm and asked for help understanding how bias was affecting their performance review process. The head of DEI at the firm had noticed some “red flags” and wanted to address them with data and not just anecdotes.
The authors looked closely at the firm’s review process. Its results presented some startling discrepancies. The most troubling statistic was that only 9.5% of the performance evaluations given to people of color ever mentioned leadership — 70% less than white women. These were important numbers because leadership mentions “typically predicted higher competency ratings the next year.”
From the data, the team found four broad categories of bias patterns. Though gathered at this one firm, the patterns can be found within most modern large corporate environments.
Pattern 1: Prove It Again
Once a group was stereotyped as less capable — including “women, people of color, individuals with disabilities, older employees, LGBT+, and professionals from blue-collar backgrounds” — members of those groups had to prove their worth over and over again. Moreover, unlike white men, who were judged mainly on their potential, members of “prove it again” groups were judged primarily on past performance, with mistakes noticed and remembered longer than non-group members.
Pattern 2: The Tightrope
Women and people of color were permitted a “narrower range of workplace behavior.” To succeed, white men needed to show decisiveness and ambition, but women and people of color who behaved the same way risked being seen as aggressive or difficult to work with. Moreover, while getting along well with co-workers was often optional for white men, for white women and people of color it was required in order to receive a positive evaluation. A case in point, note the authors: “83% of Black men were praised for having a “good attitude” vs. 46% of white men, and 27% of white women were praised for being “friendly and warm” vs. 10% of white men.”
Pattern 3: The Maternal Wall
Another common pattern was the assumption that once a woman became a mother she was no longer fully committed to the firm, that this was probably the best outcome anyway, and that she was, therefore, less competent. Surprisingly, 20% of female associates with children received comments on their evaluations concluding — on behalf of the women — that becoming a partner was no longer important. The authors suspect that “many of these women had not said so and that managers were just making assumptions about their diminishing commitment to their work after having children.”
Pattern 4: Racial Stereotypes
As one might expect, racial stereotyping was alive and well in the firm. In their analysis, the authors consistently found racial tropes across the performance evaluations examined. Asians were “good technically but lacked leadership” or “people of color need to be more willing to sacrifice work-life balance than white men.”
Once the problems were better understood, the authors worked with the law firm to improve the firm’s performance evaluation process. Their two-part solution was simple and effective.
The first decision was to change the evaluation form itself. The original document had a series of open-ended questions that left much to the reviewer’s personal ideas and asked for no evidence to support conclusions. The new form created categories for evaluation and asked any rating to be backed by at least three pieces of evidence. The authors make an interesting point about the problem that the evidence request was designed to address. The “halo-horns” effect, they note, is “where white men are artificially advantaged by global ratings because they get halos (where one strength is generalized into an overall high rating) whereas other groups get horns (where one mistake is generalized into an overall low rating).”
The authors’ first change was consistent with research showing that the more performance reviews are “open boxes,” the more likely it is to find biases in what are supposed to be objective evaluations. Indeed, as another HBR article noted in 2019:
As many studies have shown, without structure, people are more likely to rely on gender, race, and other stereotypes when making decisions – instead of thoughtfully constructing assessments using agreed-upon processes and criteria that are consistently applied across all employees.
The second decision was to develop a simple one-hour instructional workshop that taught firm employees how to use the new form. The workshop presented specific examples from the old evaluations and asked attendees to draw their own conclusions about whether the old conclusions displayed any of the four bias patterns noted above.
In the two years since the performance evaluation process was changed, the improvements have been both dramatic and identifiable in the data:
In year two, not only did people of color get more leadership mentions (100% in year 2), they also got wildly more constructive feedback. Only 17% of the comments given to people of color contained constructive feedback in year one, as compared to 49% in year two. Constructive feedback increased for white women, too (from 10.5% to 29.5%) — and for white men (from 15% to 27%). This highlights a supremely important point: Using an evidence-based performance evaluation system helps all employees. In year two, the evaluation form’s specificity also allowed for far more effective assessments of the key skills and contributions which are of great value to the company.
The authors further note that:
The intervention leveled the playing field in other important ways, too. White men had longer, more complex evaluations in year one; in year two, both word count and language complexity were similar across all groups. Negative personality comments sharply declined in year two for people of color: 14% had a negative personality comment in year one, but 0% in year two.
Of course, it took a long time to create this firm’s performance evaluation challenges, and two years is not enough time for everything to suddenly be made right. In the firm, for example, “white women were still far more likely than other groups to have comments in their evaluations saying they need additional opportunities (51% vs 33%) and that they deserve promotions (37% vs. 22%).”
The authors freely admit that DEI efforts are strategic efforts and require repeated experimentation and analysis. It is no different, they note, from changing any other aspect of a big company — from sales to supply chain. That said, the very human cost that bias creates is only increased when one considers that a performance review is only the first part of an equitable rewards system. Indeed, there is no guarantee that the second part — compensation and promotion decisions — is fair even if the first part is. Indeed, a study by MIT’s Emilio J. Castilla that analyzed almost 40,000 performance reviews (given to almost 9,000 employees at one service company) found that even when women and people of color exhibit performance similar to white employees, the latter are still less likely to get similar salary increases or promotions. Castilla found that “different salary increases are granted for observationally equivalent employees (i.e., those in the same job and work unit, with the same supervisor and same human capital) who receive the same performance evaluation scores.” The reason for this outcome? HR leaders who approved salary and promotion recommendations “significantly discounted” positive women and minority performance appraisals, meaning that these groups needed to work harder and obtain higher performance scores in order to receive similar salary increases to white male employees.
DEI is a complex challenge, and its complexity often leads to inaction. The results of this case study suggest that when approached thoughtfully and with data, DEI efforts can make significant positive changes for organizations looking to move beyond supportive statements. Indeed, at this moment when so many companies are trying to promote racial equity, the good news from this experiment is that a patient methodical, and data-driven approach can create real, measurable, and hopefully persistent benefits for all employees.
Joan C. Williams, Denise Lewin Loyd, Mikayla Boginsky, and Frances Armas-Edwards. How One Company Worked to Root Out Bias from Performance Reviews. Harvard Business Review, APR 21, 2021. https://hbr.org/2021/04/how-one-company-worked-to-root-out-bias-from-performance-reviews