Data Science

5 Steps to Take as an Antiracist Data Scientist

5 Steps to Take as an Antiracist Data Scientist

Data scientists are data stewards. They collect data, store data, transform data, visualize data, and ultimately impact how data are used. In our data-driven world, stewards hold the objective responsibility to use data to tell stories and effect change in a positive way. In order to be a good steward, data scientists need to be more than simply “not racist”— they need to be antiracist. In an online article by Towards Data Science, Emily Hadley listed the five steps data scientists should take toward good stewardship in the face of racist/hateful ideology in our contemporary society:

To be antiracist data scientists, we must take the steps to be antiracist individuals. Being antiracist is different for white people than it is for people of color. As written in this toolkit by the National Museum of African American History and Culture: “For white people, being antiracist evolves with their racial identity development. They must acknowledge and understand their privilege, work to change their internalized racism, and interrupt racism when they see it. For people of color, it means recognizing how race and racism have been internalized, and whether it has been applied to other people of color.” This excerpt from The Racial Healing Handbook by Dr. Anneliese Singh is a great place to start as it walks through the six responsibilities that individuals can take in the ongoing process to be antiracist: Read, Reflect, Remember, Risk, Rejection, and Relationship Building.

To white readers specifically who have begun to acknowledge privilege and are looking to Read and Reflect — before burdening Black, Indigenous, or People of Color (BIPOC) friends with requests for reading resources or conversation, start with the many resource lists that are currently available online, such as here and here, and reach out to white friends who are also on this journey for conversation.

As data scientists, we use data to answer questions, solve problems, and (hopefully) have a positive impact. But history has repeatedly shown that good intentions are not enough. Data and algorithms have been used to perpetuate racism and racist societal structures. It is imperative that we educate ourselves about these realities and the uneven effects they have had on Black lives*. This list is meant as a starting point and is by no means exhaustive; we must continue to learn from, contribute to, and amplify research and reporting on this work in our efforts to confront these challenges.

New Articles: Racial Bias in a Medical Algorithm Favors White Patients Over Sicker Black Patients; Many Facial-Recognition Systems Are Biased, Says US Study; Machine Bias: There’s software used across the country to predict future criminals. And it’s biased against blacks; As Cameras Track Detroit’s Residents, a Debate Ensues Over Racial Bias; Facebook’s ad-serving algorithm discriminates by gender and race; How community members in Ramsey County stopped a big-data plan from flagging students as at-risk

Lectures: Big Data, Technology, and the Law; Algorithmic Justice: Race, Bias, and Big Data; Legitimizing True Safety (which includes discussion of facial recognition and how police surveillance is currently being used against Detroit residents accused of violating social distancing orders)

Books (consider purchasing from a Black bookstore): Algorithms of Oppression: How Search Engines Reinforce Racism (Safiya Noble); Artificial Unintelligence: How Computers Misunderstand the World (Meredith Broussard); Automating Inequality: How High-Tech Tools Profile, Police, and Punish the Poor (Virginia Eubanks); Technically Wrong: Sexist Apps, Biased Algorithms, and Other Threats of Toxic Tech (Sara Wachter-Boettcher); Weapons of Math Destruction (Cathy O’Neil)

Experts to Follow: Nasma Ahmed (Digital Justice Lab); Alvaro Bedoya (Visiting Professor of Law at Georgetown University and Founding Director of the Center on Privacy and Technology); Meredith Broussard (Associate Professor at NYU); Joy Buolamwini (MIT Media Lab, Founder of Algorithmic Justice League); Max Clermont (Senior Political Advisor to Holyoke Mayor Alex Morse); Teresa Hodge (Co-founder and CEO of R3 Technologies); Tamika Lewis (Fellow at Data Justice Lab); Yeshimabeit Milner (Co-founder and Executive Director, Data for Black Lives); Tawana Petty (Non-Resident Fellow at the Digital Society Lab and Director of Detroit Community Technology Project); Rashida Richardson (Director of Policy Research at AI Now); Samuel Sinyangwe (Co-founder of Campaign Zero); Latanya Sweeney (Professor of Government and Technology in Residence at Harvard University, Director of the Data Privacy Lab)

Organizations to Follow: Data & Society; AI Now; Digital Civil Society Lab; Center on Privacy and Technology; Data for Black Lives; Campaign Zero; Digital Equity Laboratory; Data Justice Lab; Algorithmic Justice League

As antiracist data scientists, we must commit to taking action every day in our own work to eliminate racist decisions and algorithms. There is no one checklist that will accomplish this, but Hadley found herself regularly applying a series of questions to the data science projects that she contribute to. Portions of these questions come from a 2018 lecture I attended titled “The Data You Have and the Questions You Ask It” by Logan Koepke, a Senior Policy Analyst at Upturn.

If the answers to these questions reveal underlying racism, we must speak out and challenge the status quo.

Start with the data you have. Review the data and always reach out to subject-matter experts to better understand:

  • How was the data obtained?
  • For whom was the data obtained?
  • By whom was the data obtained?
  • Was permission granted to obtain the data?
  • Would individuals be comfortable if they knew this data was being obtained?
  • Would individuals be comfortable if they knew how this data was being stored or shared?
  • To what end was the data obtained?
  • How might this data be biased?
  • Explore the zine Digital Defense Playbook to consider how you might better inform and include broader communities, including Black communities, into the conversation on obtaining and using data

Consider the questions you’re hoping to answer or the problems you’re hoping to solve with your data. Ask:

  • Are the communities that will be impacted by this analysis involved in the process of shaping the questions you’re hoping to answer? If not, why not?
  • Do current goals complicate the use of historical datasets and use them in ways that are different than originally intended?
  • To what extent are predicted outcomes dissimilar from the observations in the data? Is the question you’re asking trying to force a reality that isn’t grounded in truth?
  • Does the very act of prediction also change the future observation space? How might behaviors change because of the predictions?

When you’re building a model, think like an adversary:

  • How could this system be gamed?
  • How could it be used to harm people, especially those in BIPOC communities?
  • What could be the unintended consequences of this model?
  • As the model “learns” from new data, how might this new data introduce new biases?

When you’re communicating the results of the model:

  • Is the model communicated such that the community who contributed the data is able to view and understand the results?
  • Have you clearly communicated the ways in which the model was tested to uncover racial bias?

Learn the Technical Details:

There is a growing body of research of technical approaches to addressing race in algorithms in a way that considers fairness. Simply not including race as a variable in an algorithm and saying that you have “Fairness through unawareness” is unacceptable: just because an algorithm does not include race as a predictor does not mean that it is unbiased. Instead, data scientists should explicitly consider the sensitivity of algorithms to race. This article provides an introduction to algorithmic fairness including the concepts of Demographic Parity, Equalized Odds, and Predictive Rate Parity, and tools that can be used to reduce disparity during pre-processing, training, and post-processing. This article illustrates how to explore Demographic Parity using SHAP, an explainable AI tool. The report Exploring Fairness in Machine Learning for International Development by the MIT D-Lab explores how to integrate fairness into a machine learning project with considerable detail. For additional learning, utilize this free online textbook and these videos: Google Machine Learning Crash Course Fairness in ML; 2017 Tutorial on Fairness in Machine Learning; 21 Fairness Definitions and Their Politics.

The 2020 Harnham US Data and Analytics Report found that only 3% of Data and Analytics professionals identified as Black, and even fewer in leadership positions. This is unacceptable, particularly as we (non-Black data scientists) continue to use data collected from and write algorithms that impact Black communities.

To push the organizations we work for and the data science community at-large to change, we must commit to:

  • Confronting our own unconscious biases and how they manifest themselves in the workplace so as to make our field a more inclusive space
  • Inventorying our internal company practices and making changes to advance equity, diversity, and inclusion at all levels of our organizations
  • Reviewing and updating our hiring processes so they don’t reflect unconscious biases of the individuals/teams responsible for hiring
  • Demanding representation on executive leadership teams, boards, and expert panels
  • Developing leadership pathways to support emerging leaders from historically underrepresented backgrounds

It is no secret that data science is a lucrative field with a mean annual salary of approximately $100,000. Since we were not born knowing data science, many of us have likely entered this field thanks to robust educational experiences. As antiracist data scientists, we must recognize that we live in a racist society where education opportunities are distributed unequally. Since data science impacts everyone, we must commit to using the financial resources we’ve received for our work to support educational experiences that increase diversity in the data science workforce (and make this lucrative field more accessible) as well as data awareness for everyone.

Support Black-led and community-driven organizations contributing to data awareness

Set up recurring monthly donations to Black-led and community-driven organizations contributing to data awareness, data collection, and data visualization of timely issues such as police violence. Organizations to consider include:

Support data science and tech programs that serve Black students

Set up recurring monthly donations to support data science and tech programs that serve Black students. While it may be tempting to volunteer for teaching opportunities, it can be extremely powerful for BIPOC students to learn from BIPOC data scientists. Consider financially supporting programs such as:

Start a scholarship at your local community college

In 2016, Google completed research highlighting the role that community colleges can play and the challenges they face in creating a pathway to increased diversity in computer science. Community colleges generally have substantially smaller financial requirements than universities for starting a scholarship, and these scholarships can go a long way. Reach out to the financial aid office at your local community college to get started today.

Start or contribute to a scholarship or data science program at a historically Black college or university (HBCU)

Many HBCUs have existing or new data science programs including:

Reach out to these programs directly to learn more.

Click here to read the full, original article.

Suggested Articles

Study shows hospital administrative intensity affects health care improvements

For hospital administrators, there’s a sweet spot between being too hands-on and being too hands-off when it comes to management. In the middle ground, health care leaders can improve patient…

New course marries design with data to amplify impact

Representing data visually so that others can understand requires interdisciplinary problem solving and creativity. Sometimes beautiful design does not tell the full story, or the significance of data is undermined…

NSF Award: SenSE program combines sensors and deep machine learning to treat heart failure

With the pandemic a rightful focus worldwide, it is important to know scientists and engineers remain diligent in the treatment of other life-threatening and costly issues. Congestive heart failure affects…

Pomerene Hall wins AIA Ohio Design Award

Pomerene Hall, the recently-renovated campus landmark and home to TDAI, was recognized by the American Institute of Architects last month, being given the Merit Award for Newly Completed Project. Competing in…

TDA Fall Forum recap

TDA@OhioState’s inaugural Fall Forum Oct. 8 was a success, exploring the myriad roles data analytics plays in advancing health, food production, materials manufacturing and sustainability, and our ability to communicate…

As data scientists, we use data to answer questions, solve problems, and (hopefully) have a positive impact. But history has repeatedly shown that good intentions are not enough. Data and algorithms have been used to perpetuate racism and racist societal structures. It is imperative that we educate ourselves about these realities and the uneven effects they have had on Black lives*. This list is meant as a starting point and is by no means exhaustive; we must continue to learn from, contribute to, and amplify research and reporting on this work in our efforts to confront these challenges.


Donovan Larsen

Donovan is a columnist and associate editor at the Dark News. He has written on everything from the politics to diversity issues in the workplace.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button