Is That AI Racist?: How AI Bias Is Affecting Health Care—And What We Can Do About It
SUmmary
People are biased, and people build AI, so AI are biased, too. When AI is used in hospitals to treat patients, that bias comes to health care.
For example, a 2019 paper in Science found that a commercial risk-prediction tool was less likely to refer equally sick Black people than white people to receive extra care resources. In fact, only 17.7% of patients that the algorithm assigned to get extra care were Black, but, if the algorithm were unbiased, the percentage would be much higher—46.5%.
This episode will look at how the racial disparities baked into the health care system also make their way into the AI that the health care system uses, creating a vicious cycle. Nic Terry (an expert in the intersection of health, law, and technology) and Ravi Parikh (a practicing oncologist and bioethicist) will discuss legal and ethical concerns. Michael Abramoff (an ophthalmologist, AI pioneer, and entrepreneur) will share how he’s trying to build a fairer AI.
Episode
Transcript
Nic Terry: There is no doubt that we have a bias issue, and I think pretty much everyone around the world, not just regulators, not just manufacturers, policymakers are looking at this. The question is, what do you do? How do you fix it? Because to use that overused phrase, you are peering into a black box and it’s very hard to know what is going on. And if you don’t see what’s going on, it’s very hard to fix it.
I. Glenn Cohen: I’m Glenn Cohen. I’m the Faculty Director of the Petrie-Flom Center, the James A. Attwood and Leslie Williams Professor of Law, and the Deputy Dean of Harvard Law School and your host. You’re listening to Petrie Dishes, the podcast of the Petrie-Flom Center for Health Law Policy, Biotechnology, and Bioethics at Harvard Law School.
We’re taking you inside the exploding world of digital health, one technology at a time. Today our topic is bias in artificial intelligence or AI.
Nic Terry: We know that AI Software has been shown to be capable of gender and race biases and that these biases are likely to perpetuate stereotypes. This is not simply a healthcare AI problem. We’ve seen it in criminal justice AI algorithms.
I. Glenn Cohen: That’s Nicholas Terry, a Professor of Law at Indiana University McKinney School of Law and Executive Director of the Hall Center for Law and Health. Nic is going to share some examples of how AI can discriminate.
Nic Terry: Algorithmic discrimination in healthcare occurs with surprising frequency. I’m quoting my friend Sharona Hoffman and Andy Podgurski who wrote a paper in 2020 on this. Quote: “A well-known example is an algorithm used to identify candidates for high-risk care management programs that routinely failed to refer racial minorities to those beneficial services. Some algorithms deliberately adjust for race in ways that hurt minority patients. For example, algorithms have regularly underestimated African-American’s risks of kidney stones, death from heart failures and other medical problems.”
I. Glenn Cohen: Where does this bias come from?
Nic Terry: As one begins to unpack that question, you also have to look at the healthcare system itself. It’s not like we just invented racism because we had an algorithm. If anything, we’ve learned over the last year or so, the deep structural issues go beyond the sort of technology of the day. The reality is that there are structural determinants that seemed to have racist overtones in our health care. Why do Southern states spend relatively little on health care? Why do they refuse to expand Medicaid? Why, maybe not on a system-wide basis, but on an individual basis of people working in a healthcare system to sort of mid-level administrators appear to deny valid claims? Or clinicians allowing their implicit biases to affect the treatment provided to a patient of color? There are also increasingly likely to be biases that aren’t necessarily, or explicitly, or even implicitly racist. But because these systems will tend to, or may, prioritize or maximize profit by using the AI rather than delivering quote the best bedside health care. Layering healthcare AI on top of a system multiplies problems because of bias amplification.
I. Glenn Cohen: As Nic mentioned, AI bias can stem from preexisting biases in the healthcare system. It can also come from the data sets developers use to train their AI. If the training data is not representative of the broader patient population, the AI will replicate whatever sampling bias was present when it’s deployed in the clinical setting. Listen to Professor Ravi Parikh as he explains how this could occur.
Ravi Parikh: My name is Ravi Parikh. I’m an Assistant Professor of Medicine and Medical Ethics and Health Policy at the University of Pennsylvania, and I’m a practicing oncologist at the corporal Michael J. Cross VA Medical Center in Philadelphia, Pennsylvania.
I. Glenn Cohen: Ravi has been doing research with an algorithm that predicts when cancer patients are nearing the end of their lives. He and his colleagues think that this algorithm could be used as a decision support tool for clinicians. If doctors know their patients are near the end of life, they can connect those patients with resources to make their time more comfortable, such as palliative care.
Ravi Parikh: So there’s two types of bias. There’s bias in the prediction that’s made and there’s bias in the outcome that gets generated as a result of that prediction. And so a lot of the work that’s been done so far, in the academic space has focused on the potential for algorithms to generate bias. Just think about using the algorithm that we’ve been using for predicting mortality in cancer patients. If we are taking that data set from an electronic health record in a Pennsylvania based hospital, for example, there’s a variety of demographics that may be overrepresented or underrepresented in that data set, but even furthermore, there’s a variety of upstream factors that lead someone to seek care in a health system and be included in the underlying EHR cohort. And so that algorithm is likely to be biased in some way, shape, or form because of those two very salient factors that selection bias and what it takes to get into the cohort and then the characteristics of the cohort altogether. And so, bias doesn’t mean that it’s intentionally discriminating against certain races or certain ethnic groups or certain socioeconomic classes, but a lot of this is unintentionally baked into the algorithm. Really understanding the training population about where the AI was trained and whether it’s likely to replicate in that particular population is really important.
I. Glenn Cohen: A 2021 paper in the Journal of Communications Medicine shows just how high the stakes are using the example of AI-based melanoma detection. Here’s a quote, “If a machine learning based system is trained to recognize skin disease, such as melanoma, based on images from people with white skin, it might misinterpret images from patients with a darker skin tone and might fail to diagnose melanoma. This can lead to potentially serious consequences. Since melanoma is responsible for the majority of skin cancer associated deaths and early diagnosis is critical for it to be curable.” But training data is only one possible source of bias in AI. Consider that high risk care management algorithm you heard about earlier. The source of bias in that case is what the authors in science called ‘label bias.’ In what might have seemed like a very reasonable design choice ahead of time, the algorithm developers took total medical expenditures or cost as its target, which is ordinarily a reasonable proxy for health need. In this data set, white patients have higher costs than black patients for the same health needs. That causes the algorithm to prioritize white patients over black ones. This kind of bias exists even when the training data is representative of the patient population. It’s a much harder and subtler kind of bias to spot. There are other kinds of bias as well. AIs can continue to collect data and learn even after they’ve entered clinical use. This can potentially reinforce physicians’ implicit biases and worsen disparities in clinical care.
Ravi Parikh: Algorithms are based on largely existing practices. So if existing practices are biased or if a certain class or group of patients is more likely to come into the hospital and utilize the healthcare system in a certain way then another class of individuals, then the algorithm predictions are going to reenforce that bias just because that’s the data that they have access to. We as clinicians are much, much less likely to have end of life conversations with patients who are non-English speaking. Why is that? Because the conversation is really difficult to have and so charged. And so, we’re just less likely to bring those up rightly or wrongly for patients who are non-English speaking, oftentimes wrongly, we should be doing this more for those patients.
I. Glenn Cohen: We’ve seen the substantial potential for AI technology to exacerbate bias, but is it at all capable of mitigating bias? Ravi explains how the algorithms he used helped address disparities in conversations about end-of-life care.
Ravi Parikh: An algorithm utilizing a set of oftentimes unbiased predictors, things like laboratory values or imaging results, can oftentimes provide a nudge towards having a conversation that you wouldn’t have thought about in the first place. And so what we actually found in our trial was that, our intervention had an effect across all age and class and race groups, but it had a disproportionate impact towards lower income patients and towards, non-English speaking patients or patients of Non-Hispanic White ethnicity because those groups are starting off at relatively lower levels of conversations to start off with.
I. Glenn Cohen: So how do we steer AI towards correcting rather than perpetuating bias?
Nic Terry: So the words, the phrases that seem to come out the most from smart people who thought about this tend to be transparency. So work hard to remove the opaqueness from the AI, make it technologically transparent. If we don’t know how healthcare AI makes decisions, how can we assess whether clinicians should rely on the technologies, whether it’s for the quality of the decision making or the bias or conflicts of interest.
I. Glenn Cohen: Transparency seems key to determining where exactly bias exists, but to many lay observers, AI processes remain opaque. How else can AI researchers and policy makers build confidence in machine learning?
Nic Terry: We need to give citizens the confidence to take up AI applications and give companies and public organizations the legal certainty to innovate using AI. And so you have to have that human centric approach to build trust in AI that’s human centric. So I think those are the phrases and the thoughts being put into place.
I. Glenn Cohen: Beyond transparency, what else should policy makers focus on in addressing bias in AI? Let’s hear from Michael Abramoff.
Michael Abramoff: I’m Michael Abramoff. I’m a Professor of Ophthalmology at the University of Iowa, as well as professor of Electrical and Computer Engineering. I also happen to be the founder and executive chairman of Digital Diagnostics.
I. Glenn Cohen: Michael’s AI company works to diagnose an eye condition, diabetic retinopathy, that causes blindness in diabetic patients.
Michael Abramoff: It’s more than just being careful about the training sets like many people think. And so what we did was we design AI in a way that it looks for biomarkers. And what is interesting about biomarkers in this case for diabetic retinopathy is that they’re racially invariants. If in your retina there’s hemorrhages or there’s new vessels where they shouldn’t be, new blood vessels where they shouldn’t be, that means you have diabetic retinopathy. It doesn’t matter what ethnicity or what pigmentation in your retina. And so these AIs are built to detect these biomarkers that makes them more racially variant than if you have a training set that you’re ultimately blind to with all the biases and election bias in the training data that may, for example, have more of a certain race that has less disease in a training set, which will lead to it missing the disease if seeing a new patient with the same racial background. So this is just one aspect of the design and the training. But also we focused, during the validation, the clinical trial, on making sure that the AI was equally accurate on different races and ethnicities. So we took great care to sample all people, all 30 million people, with diabetes in the U.S. to make sure it was representative in terms of different races and ethnicities so that we could show the absence of an effect on accuracy of race or ethnicity or effect sex or age.
I. Glenn Cohen: In some cases, there are other ways of reducing the chance of biased AI. Let’s hear again from Ravi.
Ravi Parikh: So oftentimes, one strategy that we took was pulling the algorithm silently in a prospective setting, and just seeing whether any biases came out, both in terms of the predictions and in terms of the conversations that would be had in a pilot setting. And so oftentimes that silent period, often ranging from three to six months, something like that, not a ton of time can really serve to convince others and convince yourself that the algorithm may or may not be biased. And then the third thing is once it’s scaled up and once it’s deployed having mechanisms to track outputs with regard to protected subgroups, because many of the biases you don’t know until you actually deploy it in a population that isn’t too similar to what you originally trained the algorithm on.
I. Glenn Cohen: The AI developers themselves can employ a variety of tactics to monitor biases during multiple stages in the design and implementation process. But who else should be involved in that process?
Nic Terry: I think one unanswered question, or at least, I don’t know whether it’s been answered yet, is whether we can train the algorithm to police the algorithm. Whether there are ways of constructing unbiased bias-police that can go into the systems.
I. Glenn Cohen: What Nic is saying is that AI itself may be equipped to monitor its own biases. How else can AI improve itself?
Ravi Parikh: Any good algorithm that’s based on temporal data, should probably update or recalibrate at certain points. And so the variable inputs that you understand when you approve the algorithm or up took the algorithm might be different from what the algorithm is actually using. And as long as those changes are made transparent and the justifications are well-made, I think that there are real gains to be had from hypothetical increases in performance if those increases in performance are validated and justified.
I. Glenn Cohen: As AI continues to make its way into new facets of our lives like healthcare, issues like bias are raising the ethical stakes. Looking very broadly, are the risks of AI worth it? Here’s what Nic had to say.
Nic Terry: I’m a tech nerd. I worship at the altar of tech, I believe in tech, I believe systems, I believe technologies. Am I naive and don’t think they sometimes go wrong, of course they go wrong. But I know as a matter of fact that my car, when it’s in autonomous mode, brakes faster than I can. And I would assume that the kind of technology that we’re seeing in AI ML is going to improve healthcare. That doesn’t mean it’s not going to be a rocky road.
I. Glenn Cohen: If you liked what you heard today, check out our blog ‘Bill of Health’ and our upcoming events. You can find more information on both at our website, petrieflom.law.harvard.edu. And if you want, get in touch with us, you can email us at petrie-flom@law.harvard.edu. We’re also on Twitter and Facebook @petrieflom.
Today’s show was written and produced by Chloe Reichel and James Jolin. Nicole Egidio is our audio engineer. Melissa Eigen provided research support. We also want to thank Nic Terry, Ravi Parikh, and Michael Abramoff for talking with us for this episode.
This podcast is created with support from the Gordon and Betty Moore Foundation and the Cammann Fund at Harvard University.
I’m Glenn Cohen and this is Petrie Dishes. Thanks for listening.
Created with support from the Gordon and Betty Moore Foundation and the Cammann Fund at Harvard University.