A Conversation on Algorithms and Bias

Machine learning presents itself to be fair and impartial. It is not.

In 2018, UCLA Information Studies professor Safiya Noble published her best-selling book "Algorithms of Oppression,” exposing how search engines, specifically Google, perpetuate discrimination and racism. As a result, the search engine companies paid lip service to cleaning up their acts and indeed, two years later, some thing have gotten better. (For instance, when you Google “black girls,” you are no longer served up a page of pornographic and sexually explicit websites). But as Noble points out, there’s still a long way to go. And at a time when we rely on search engines more than ever for news and information, the stakes are critically high. Below, a discussion on the state of algorithmic bias in general, Google's transgressions in particular, and how we can help guard against a future where technology works in the service of existing prejudices and escalates social injustice.

How have things improved since your book? How haven't they?

There's certainly is a bit more attention on the role of search engines and their import. Certainly, there's more scrutiny around Google and the role that they play in propagating disinformation, hate speech, anti-democratic kinds of conspiracy theories and so forth. We’ve moved into a little bit more of the tech-lash, where there's an audience that can understand the kinds of things that I was writing about in my book. When I first started writing about the existence of racist and sexist algorithms, people argued that it was impossible, that computer code was “just math” and that computer programming was value-free. Of course, fast forward a decade, we all know better. I like to think that I was part of bringing about that change, along with several other critical scholars and journalists.

However, many things remain the same. Google is still the number one monopoly leader in search, and it captures a lot of our attention, particularly when we're trying to access all kinds of information. I think people are much more sophisticated about fact-checking the kinds of things they encounter on social media now, but they're going to Google to verify fact from fiction. They're becoming more reliant upon a search engine as an arbiter of truth, which is incredibly dangerous, in my opinion, based on the research. And then, of course, we are also in a global pandemic and a public health crisis, so other kinds of critical information resources like libraries, and schools, teachers, professors, and universities are increasingly inaccessible in this moment. We need more robust and trusted knowledge and information resources than just search engines to make sense of the world right now.

Let’s talk about Google’s autofill function. Today, we searched “George Floyd” and the very first suggestion was “criminal past.” His “GoFundMe” was fifth down the list.

You know, when Trayvon Martin was murdered by George Zimmerman, this same phenomenon happened where the auto complete information that Google provided was really derogatory. And when Eric Garner was murdered by NYPD, it did the same. And then the same happened to Mike Brown when he was murdered in Ferguson, Missouri. It's not just that people are interested in manipulating SEO terms and making sure the worst kinds of misrepresentative ideas about black people are circulating on the Internet, the entire Google AdWords infrastructure of search is about making money through high traffic sites. Search engines make a lot of money from sites that are highly trafficked and optimized. National news organizations that have millions of clicks on their sites have a huge responsibility to think about the metadata they use—the keywords they use to optimize their content in the search engine crawls so that their stories make it to the first page of Google results. If they use words like “criminal past” to pull eyeballs to their stories, then that will get picked up, because those large news companies pay a lot of money to Google to make their content visible. But in the case of smaller blogs and racist disinformation sites, they use these kinds of SEO tags across the surface of a variety of smaller networks of web sites, and that’s another way to misrepresent and propagate harmful ideas. I talk in detail in my book about how this happens, and how these technical dimensions of search can be quite difficult to intervene upon.

One of the things I try to explain in my research is the way in which search engines profit off this kind of exploitation of victims of police brutality and police violence. For example, at the moment #BlackLivesMatter is calling for the “defunding” of police. People are turning to the Internet to figure out what that idea means. Google is going to be very important in making visible reliable information about this call, or in subverting those calls with information that subverts the message, particularly as people turn up to vote on ideas that may range from funding greater social services to establishing non-violent community policing initiatives to abolition of the militarized police state. And then here you have, co-mingled in these results, optimized content to criminalize George Floyd. So, those things work in tandem and it's really important that we pay attention.

In the end, those with the most titillating, racist, sexist, and click-bait kinds of content are often elevated because they generate a lot of web traffic, and that makes money for all search engines. This is why disinformation and racism is so profitable, especially in the United States.


Photo Credit: Stella Kalinina

You refer a lot to Google. Is that because it’s the biggest perpetuator or just the biggest search engine?

When I'm talking about Google, I mean Google search, because Google is the monopoly leader in search. They have more than an 80% market share in search and even higher in mobile search. And, I’m also talking about the sector, because competitors are often trying to mimic what the monopoly leader does. Google sets the tone and the agenda in the search market for now.

The role that search engines play in algorithmic bias is pretty well established. Is there, in your opinion, a new or emerging hot spot?

One of the most important dimensions of big tech we should be paying attention to right now is predictive analytics, which is, of course, the backbone of how the modern Internet works, including search. Social media, apps, and platforms that we use to work, play and increasingly participate in throughout society— all of these technologies are really about profiling us, sorting us, selling us to thousands of data brokers who are behind the scenes in order to cultivate data profiles about us that, I believe, will overdetermine what kinds of opportunities we will have in the future. We see the skeleton of this already in existence where, for example, banks are making decisions about whether you're credit-worthy or not, based on your social network. Then there’s the example of the Palestinian student who was on his way to Harvard last fall, and was stopped at the border because immigration didn't like the kinds of posts that his friends had made on his social media account.

These are the kinds of things that have been maybe been thought of as outliers, like the use of facial recognition and predictive policing, but they will move to the center if we don’t organize and legislate to protect our civil and human rights. I think there are a variety of digital technologies that are not household names that are deeply implicated in widening social inequality and undermining democracy.

And racial inequality?

Well, certainly in Los Angeles, groups like Stop LAPD Spying Coalition have helped us understand the ways in which LAPD uses its past arrest data to inform which neighborhoods it should send more police to. Of course, if you live in a Black or Latino or a low-income neighborhood in Los Angeles, you're more likely to have encountered police, or are more likely to have been arrested, even if you haven’t been convicted. So, if past arrest data informs the future, then we're in a vicious feedback loop.

Vulnerable people and people of color are often beta tests for the models and logics for predictive analytic projects. This is why we have to pay incredible attention to them. What's even more pernicious and dangerous about predictive analytics is that they are packaged and bundled and sold to governments and the public as if they are more fair than the decisions human beings would make. And, of course, we know that's absolutely not the case.

What can we do both in terms of big change (policy, etc.) and every day consumption habits to effect change?

A lot of the challenges we're talking about need public policy intervention. I think many can be worked through in local ways while we work on state, national and international frameworks and legislation. We should be talking about these problems in our homes, in our neighborhoods and in our community organizations. When the school board decides to adopt predictive analytics for your school system, parents should show up and voice their opposition.

Also, we can look for and back candidates who have a sophisticated, critical technology agenda. And by “critical,” I mean an understanding of the whole host of attendant power issues associated with these technologies and a sense of how to protect the public from the extractive models foisted upon us by Big Tech. And when we see tech workers trying to organize and unionize to push back against the pernicious decisions and policies their companies are making we should support them and we should fund them.

There's just hundreds of ways that can all be engaged. And I do not believe that there is not one pathway to change, there are many, and we can work from a shared awareness and shared goals. We see people effectively responding to, resisting, and shutting down some of the dangerous technologies that widen the gap of racial, economic and gender injustices. We need more of that.

Safiya U. Noble, Ph.D. is an associate professor and co-director of the UCLA Center for Critical Internet Inquiry. She holds a Ph.D. and M.S. in Library & Information Science from the University of Illinois at Urbana-Champaign. You can follow her on Twitter @safiyanoble