AI, inequality and the persistence of the Linn Effect
New research on Airbnb's price optimization algorithm highlights AI's promise as well as its unintended consequences
In a recent post I touched upon the idea that as Artificial Intelligence (AI) and Machine Learning (ML) applications become more common, we are likely to discover unintended consequences from their adoption. Despite their power, these technologies remain in their infancy, and their creators are finding that even extensive forethought in design does not insulate these systems from failure. That failure may be severe enough that the AI actually makes a situation worse through its intervention.
A new paper from Shunyuan Zhang (Harvard), Nitin Mehta (Toronto), Param Vir Singh (Carnegie Mellon), and Kannan Srinivasan (Carnegie Mellon) describes just such an outcome with a price optimization algorithm introduced by Airbnb in 2015. The idea behind the roll-out was straightforward: because AI can do a better job than humans at analyzing massive amounts of supply and demand data across the rental platform, it should help landlords set the optimal price for a given property in the system. When the pricing algorithm went live in 2015, hosts were given the option of letting the pricing algorithm set rental prices automatically after it had evaluated a series of factors that included property features, seasonality, comparable pricing, etc. Because Airbnb has far more data and superior computational resources than any individual host, its pricing algorithm should be more effective than the average host at finding the revenue-maximizing price.
The authors note that the Airbnb project was not without its challenges. The ML system is not easy to understand for the average host. Moreover, the ML’s logic would have to consider that Airbnb’s interests do not always align with those of its hosts; thus, the algorithm should ideally choose the hosts’ interests over Airbnb’s should they ever conflict. A greater challenge was that—even for similar properties—there is a disparity in the income earned by white and Black hosts across the platform. A 2017 study1 found that:
Across all 72 predominantly Black New York City neighborhoods, Airbnb hosts are 5 times more likely to be white. In those neighborhoods, the Airbnb host population is 74% white, while the white resident population is only 14%
White Airbnb hosts in Black neighborhoods earned an estimated $160 million, compared to only $48 million for Black hosts—a 530% disparity
The loss of housing and neighborhood disruption due to Airbnb is 6 times more likely to affect Black residents, based on their majority presence in Black neighborhoods, as residents in these neighborhoods are 14% white and 80% Black.
The authors accept the reported inequalities but note that race may not be the only factor driving that phenomenon: “It is plausible that differences other than race (e.g., education and access to other resources) make it more difficult for Black hosts to determine optimal prices.” Therefore, a successful ML tool should be able to help Black hosts reach parity with their white counterparts if it were accounting fully for the factors that drive unequal racial revenue outcomes.
The Study
To understand the impact that Airbnb’s algorithm had on host revenue the authors looked at 9,396 randomly selected Airbnb properties across 324 zip codes located in seven large U.S. cities. The data included key property characteristics, host demographics, each host’s monthly revenue, and date of algorithm adoption (if any). Of the 9,396 properties studied, 2,118 hosts adopted the algorithm at some point during the observation period, the distribution of which is shown in Figure 1 below.
Figure 1: Histogram of the timing of algorithm adoption (t = 0: November 2015). y-axis: density of event; x-axis: month (Source: Authors)
Findings
The data presented several key findings. On average, hosts who adopted the algorithm saw a downward price trend of 5.7% that increased overall revenues by 8.6%. In other words, the tool made properties less expensive, which in turn led to more rentals. The authors highlight that:
Before Airbnb introduced the algorithm, Black and white hosts charged similar prices for equivalent properties (in terms of observed host, property, and neighborhood characteristics), but white hosts earned $12.16 more in daily revenue than Black hosts. The revenue gap was caused by a difference in the occupancy rate (rental demand): 20% less for Black hosts’ properties than for equivalent white hosts’ properties. The algorithm benefited Black adopters more than white adopters, decreasing the revenue gap by 71.3%.
Algorithm price reductions were similar for both white and Black hosts, but it was Black hosts who saw their occupancy rates increase the most. For this reason, Black hosts who adopted the tool obtained more value from its adoption than white hosts. This outcome, the authors note, “supports our theory that Black and white hosts face different demand curves; the demand for Black hosts’ properties is more responsive to price changes than the demand for equivalent properties owned by white hosts.”
The findings, however, illustrate two challenges. First, Black hosts were 41% less likely to adopt the algorithm. When combined with the overall revenue-lifting impact of the algorithm, the lower adoption rate means that the revenue of non-adopting Black hosts decreased relative to the host population average. In other words, since more white hosts took advantage of the algorithm’s power, revenue inequality on Airbnb increased over time. The second problem, the authors note, "is that if Black and white hosts face different demand curves (as our data suggests), then a race-blind algorithm may set prices that are sub-optimal for both Black and white hosts, meaning that the revenue of both groups could be improved by the incorporation of race into the algorithm."
The authors believe that the findings of their study point to some important recommendations for AI regulators and Airbnb. For example, current U.S. law prohibits the use of "protected attributes" (such as race) in the development of predictive algorithms. While well-intentioned, this regulatory approach may end up hurting those whom it intends to protect. Regulators, the authors argue, "should consider allowing algorithm designers to incorporate either race or socioeconomic characteristics that correlate with race, provided that the algorithm demonstrates an ability to reduce racial disparities." As a recent paper from AI ethicist Alice Xiang noted:
Ironically, some of the most obvious applications of existing law to the algorithmic context would enable the proliferation of biased algorithms while rendering illegal efforts to mitigate bias. The conflation of the presence of protected class variables with the presence of bias in an algorithm or its training data is a key example of this: in fact, removing protected class variables or close proxies does not eliminate bias but precludes most techniques that seek to counteract it.
For Airbnb managers, the hosts argue that "while Airbnb cannot overturn a racial bias that is ingrained in society at large, it could try an intervention that prevents guests from knowing the host’s race until they book the property." Moreover, Airbnb could make a greater effort to encourage the algorithm’s adoption by Black hosts. Otherwise, "a racial disparity in algorithm usage may end up increasing the economic disparity rather than alleviating it."
Conclusions
Reading this paper reminded me of a complex system implementation I managed for a global high-tech client some years ago. As we prepared to go live, I sensed that my client expected that the new system was going to make most of their production problems go away. My expectation was quite different: the new system would significantly add to the list of issues the company’s managers would have to manage. My conclusion was based on two reasons:
Pre-system, a lot of problems were being solved by human interactions that did not leave any digital evidence of the problem or solution. These problems would now appear in the system records.
The system itself would ask new questions and demand new processes that would, in turn, create brand new challenges for my client.
Realizing that it was quite possible that in the end, despite the project's success, this particular company would end up with more challenges to solve than it started with, I told them the following (true) story.
My roommate in graduate school was a Flemish engineer who lived in the lovely town of Ghent, Belgium. He had been a devoted audiophile since we met, and his dream was one day to own a Linn stereo system. For those of you unfamiliar with Linn, it is an esoteric manufacturer, based in Scotland, with a legendary reputation. One day my friend called me to tell me that he had finally bought “the Linn” and wanted me to come to Ghent to hear it.
Soon afterward, when I happened to be in Europe, I took a detour to experience this amazingly expensive system. I sat in his listening room, prepared to be amazed by the sonic brilliance of the technology. But a funny thing happened when the music started. I was speechless, but not for the reasons I had anticipated. Instead of sounding great, the music sounded terrible, much worse than in my pedestrian system at home. The more I listened, the more annoying the sound became.
Seeing my reaction, my friend asked me what I thought of the Linn. "I hate to say it," I replied, "but it sounds awful." He exclaimed, "Yes! Isn't that amazing?" Confused, I asked him whether the whole point of the tens of thousands of Euros he had spent was to make his music collection obsolete? He replied, and I will never forget what he said: "No. What you just heard were all the flaws in the original recording, which were inaudible to you on your normal stereo. Only the Linn can faithfully reproduce all the errors. That's its beauty."
I told this story to the project's steering committee and explained that what I call the "Linn Effect" occurs anytime a new technology solves a problem (or set of problems) and simultaneously creates an equal or greater number of new problems. As evidenced by Facebook’s latest problem with Instagram, social media is a great example of the Linn Effect, and we can find examples of it in smartphones, drones, autonomous vehicles, and many other recent inventions—including, perhaps, the internet itself.
Returning to our paper, it seems to me that Airbnb’s smart-pricing algorithm is yet another example of the Linn Effect at work, this time in AI. It is hardly the only one, however. In 2017, Amazon abandoned an AI project to decrease gender discrimination in recruiting because it actually increased the bias. The AI failed because its models were trained to vet applicants based on a decade’s worth of resumes submitted to Amazon. Since most past resumes were from men—not unusual in a tech company—the AI decided that male candidates were preferable. In line with this conclusion, the AI downgraded resumes that included the word “women,” for example, as well as graduates of certain all-women’s colleges.
The Amazon, Airbnb, and Facebook cases all highlight one critical aspect of getting AI right: correctly training the ML system during development. This task is so difficult and expensive today that only the richest firms can afford it at all, and wealth does not guarantee success. Given the immense challenges inherent in creating fully successful AI platforms, I suspect that we will see more cases like the one this paper thoughtfully describes in the foreseeable future.
The Linn Effect is not all bad news, however. I once described it to a radiologist friend, and her response was that the Linn Effect is real but ultimately beneficial. As she put it:
While your premise is true, I don't think the effect is necessarily a bad thing. For example, when the first MRI machines were made and used, the images were not very good for multiple reasons (poor signal, weak RF gradients, inhomogeneities in the magnetic field, various artifacts that sometimes mimicked pathology, etc.). Tackling each and every one of those problems resulted in more problems, yes, but also more innovations to fix those same problems. This cycle continued until we reached the unbelievably detailed and even beautiful images we now can acquire.
In her view, the Linn Effect leads to a “Linn Valley”—the cycle of problem discovery and solution creation that every great technological creation must cross before it reaches its full maturity. As this new paper illustrates, AI is still in the early stages of this journey, and its creators would do well to remember the full implications of this reality.
The Research
Shunyuan Zhang, Nitin Mehta, Param Vir Singh, and Kannan Srinivasan. Frontiers: Can an Artificial Intelligence Algorithm Mitigate Racial Economic Inequality? An Analysis in the Context of Airbnb. Marketing Science 0 (0) https://doi.org/10.1287/mksc.2021.1295