Skip Navigation



Exploring Bias in Data Science & Its Effects on Health Care

Published on: June 23, 2022

In every sector, we’ve seen how improvements in technology can help us make sense of all the information at our fingertips. Nowhere is this more evident than in health care.

With the use of electronic medical records, providers can not only improve patients’ continuity of care but also collect data at an unprecedented scale. We rely on technology and sophisticated algorithms to help us sort through this data, identify patterns, flag issues, and deliver insights. Those insights go on to inform all sorts of things — from care delivery to insurance coverage to research papers, supply orders, and everything in between.

Learn more about the Nursing Informatics program at USD>>

We trust that the data we collect is objective, and that’s why it drives decision-making in all areas of the healthcare industry. But what if it isn’t? 

Can data be tainted by gender bias or prejudice? Is it contributing to the widening health inequity gap? More pointedly — can data be racist?

In this blog, we will explore how bias in data science happens, what it looks like when technology and artificial intelligence (AI) discriminate, and who is being disproportionately impacted. Most importantly, we’ll discuss what we can do to change it.

What Is Data Bias?

One common definition of data bias is when the data is not representative of the population or phenomenon of the study. 

Through a broader scope, data bias can also be reflected in data sets that lack the proper variables necessary to track the intended phenomenon. Furthermore, data bias can also manifest in data that contains content produced by humans who may hold certain biases themselves. 

Common algorithms and datasets are regularly laced with biases and racism, which exacerbate health inequities that disproportionately affect Black, Indigenous, and Ppeople of Ccolor (BIPOC) communities.

Gender Discrimination In Technology

What Is Gender Bias?

Gender bias is defined as a preference or prejudice towards one gender over the other. Gender bias can be a conscious or unconscious effort, thus making it challenging to identify and ultimately dispel. When it comes to gender discrimination in technology, women typically bear brunt of prejudices and their negative effects.

Let’s take a closer look at artificial intelligence. Humans are the “authors” of artificial intelligence, and therefore, the creator’s ideas and beliefs shape their algorithms. A mere 22% of professionals in the AI or data science fields are women, leaving women’s perspectives underrepresented and often absent.

The gender digital divide also predisposes women to a lack of data representation. 300 million fewer women than men have access to the internet by way of smartphones. Because these technologies are in the hands of fewer women, there are fewer representative datasets to help form new technologies.

Racism In Technology

What Is Technological Racism?

Technology is often perceived as an entirely separate entity, devoid of human judgment and objective in nature. But, again, humans created technology. And, as a byproduct of the human imagination, tech often operates within the narrow perspective of its creator. 

Technological racism happens when an underlying prejudice is embedded in modern technology. Like gender bias, technological racism is seldom surface-level, making it undetectable enough to remain unchecked. 

A subset of technological racism, artificial intelligence racism happens when machine learning doesn’t account for BIPOC individuals. In health care, this can be downright dangerous.

For example, one of the most common medical therapies is oxygen administration. Clinicians use a pulse oximeter to determine oxygen saturation in the blood. This technology works by sending infrared light through the skin. 

However, the patient’s skin color affects the accuracy of these measurements; one study even demonstrated that pulse oximeters systematically overestimate oxygen saturation levels in nonwhite patients. Black patients are three times more likely to suffer from occult hypoxemia due to the unchecked levels in pulse oximeters.

How Our Data Encodes Systematic Racism

Data is not this revolutionary medium we often perceive it to be — it’s information. Data is what we input, essentially. So, how does our data encode systematic racism? 

There are two main contributors: Human bias and incomplete training in data and algorithms.

Human bias and deeply held prejudices slip their way into technology. As a result, they are amplified in algorithms in fields like healthcare, law enforcement, and education. 

The Correctional Offender Management Profiling for Alternative Sanctions (COMPAS) system is a program used by the U.S. Court System to predict future criminals, aiming to enable law enforcement to be at the scene of a crime before it has happened.

ProPublica, an investigative journalism source, uncovered that this algorithm was based on several human prejudices including arrest records, income levels, and postcodes. The system flagged Black people as more likely to re-offend, targeting them at a rate of 45% compared to white people at 24%. In other words, the COMPAS system was biased against Black people as the training data was saturated in systematic bias. 

Underrepresented or incomplete training data falls on the shoulders of the humans entering the information in the machine learning systems. This is a failure to give the machines a complete picture of the data. 

Amazon recently created a facial recognition platform known as Rekognition. This system had a higher proficiency rating in identifying lighter-skinned males and struggled with identifying darker-skinned men and women. This is likely due to the greater quantity of light-skinned photos and lack of dark-skinned photos which improperly trained the computer. 

The process to capture this insufficient data is even flawed. Bias in data collection is the distortion that results from information that is not truly representative of the scenario that is being investigated.

The Ada Lovelace Institute defines the data divide as the gap between individuals that have access to data-driven technologies (and therefore feel they have control over it), and those who do not. 

The process of datafication in our society references when individuals’ activity, behavior, and experiences are recorded in quantified data and therefore open to analysis. This process is largely dominated by power-holding institutions. Those that lack power and access to data-driven technologies, often minorities,  are more to be excluded from data collection due to their lack of access to physical and digital infrastructure.

Examples Of Data Bias

Data trains the algorithms we employ, which is precisely why data bias in the healthcare realm gives way to catastrophic results. Data bias precedes algorithmic bias, resulting in prioritizing certain populations over others. 

Take the mortality rate of breast cancer. Black women have a higher mortality rate of 41% for breast cancer compared to white women. However, black women are only represented in 5% of clinical trial participants.  

Furthermore, convolutional neural networks (CNNs) are used by dermatologists to classify skin lesions with the same accuracy as a trained dermatologist. CNNs have even demonstrated a higher accuracy rate in identifying melanoma compared to dermatologists. 

But CNNs are typically trained on images of skin lesions on white patients, and the approximate percentage of datasets used to train CNNs of black patients is roughly 5% to 10%. When tested on Black patients, CNNs only displayed half the accuracy when tested on Black patients.

In turn, Black patients have the highest mortality rate for melanoma, with an estimated 5-year survival rate of 70% compared to white people at 94%. 

How Do We Combat Data Bias In Health Care Data?

To combat data bias, we must first turn to the data we train technology with and seek to make it as representative of the population as possible. Developers should remain intensely vigilant and regularly audit for bias in the data they work with. Researchers can also refocus their attention on a wider social analysis, to identify areas that are particularly prone to bias. 

There are three concrete steps we can take to work towards the eradication of data bias from healthcare data: 

  • reduce bias in the planning stage,
  • measure and quantify the remaining data bias after data collection, 
  • and examine and adjust for bias in the analysis. 

So, what do we do with all our current data that is already skewed with bias? How do we make our current electronic health records (EHR) more equitable moving forward?

The use and integration of external data sources can help quantify bias in a study population. For example, utilizing publicly-available data sources can highlight differences in demographic and clinical attributes of patients in the EHR dataset.

Linking centralized databases that include the date of birth, pregnancies, and cancer history, to EHRs can enhance the skewed data and elevate it to a greater level of accuracy. 

Nurses play an imperative role in the changing of the tides and fight to end racism in healthcare. Given they are the largest segment of the healthcare workforce, their voice and advocacy can impact real change.

If you’re a nurse who’s interested in fighting racial discrimination in health care, the University of San Diego Hahn School of Nursing can help you change the trajectory of your nursing career. Whether you want to immerse yourself in research, teach the next generation of nurses, revolutionize health systems, improve data collection and analysis, or practice to the top of your license, there’s a path for you at USD. 

Request more information about our programs, or learn more about USD School of Nursing by reading our resource on healthcare racism.

Download our Guide to Nursing Informatics Today!


A Guide to the University of San Diego's Adult-Gerontology Clinical Nurse Specialist Program

The University of San Diego's nursing program ranks among the best in the nation. But that's not the only reason future students walk through our doors to pursue one of our graduate degrees. Download our guide for an in-depth look at the Adult-Gerontology Clinical Nurse Specialist program and learn how our rich heritage of excellence in nursing education can help you achieve your academic and professional goals.

USD SON - MSN Brochure Revamp COVER

A Guide to Achieving the Master of Science in Nursing at USD

For future Executive Nurse Leaders, Nursing Informaticists, or Adult-Gerontology CNSs.

Get the Guide

Explore Our Programs

Learn about your program of interest by visiting the programs page on our website. Review information on admissions, financial aid, and much more. 

View Programs