Taking Bias out of Big Data

By Laurie Riedman, a Xerox contributor

Matthias Galle and Will Radford
Matthias Galle (top) and Will Radford, data scientists at the Xerox Research Centre Europe, are exploring data’s intersection with social science.Download a free copy of “Roles for the boys? Mining cast lists for gender and role distributions over time.”

Big data converts have long proclaimed that “the truth is in the data” and that it is free of human biases compared to normal decision-making processes. Yet, as a recent Harvard Business Review article points out – data is not as objective as we think.

Xerox scientists are conducting studies that reveal the deeper you dive into the data there is no escaping our human conditioning. This is a big factor in dealing with big data, as it means that often human bias is implicitly built into the data during collection or analysis.

For example, data science is being used to automate human resources in hiring by using resumes of successful hires to train a system to predict which applicants are a better fit for the company and the job.  This sounds like a great idea except when the data is already biased – such as if the company hired more men than women. This bias will automatically be encoded into the algorithms and unknowingly influence future hiring decisions.

Matthias Galle and Will Radford (right), Xerox data scientists at the  Xerox Research Centre Europe,  are exploring data’s intersection with social science. The goal of this research is to teach computers how the world works – to create algorithms that recognize human biases.

They are part of a team of scientists who are developing new technologies that can recognize and process all sorts of data – photographs, videos, text and numbers. Compounding the problems inherent in these types of data are the sources of the data, which range from relatively “clean” massive corporate data sets, to “noisy digital traces of human activity” like Web chat messages with customer service reps,  or social media chatter. Ultimately, the knowledge we gain from the data will help customer care representatives solve problems faster and more accurately. That’s important to Xerox, which is in the customer care business in a big way — the company handles 2.5 million customer interactions every day for clients.

Biased Data in the Movies?

The team explored this idea of bias in data sets by looking at gender issues in the entertainment industry. They extracted data from IMDb, a film and television database that contains more than 15 million performer-role data points that are entered by users all over the world.  Galle’s and Radford’s team wanted to test the strengths and limitations of the data, and gain insights into how gender is represented onscreen.

They wanted to explore if art does indeed imitate life including:

  • What are viewers likely to learn about gender roles from onscreen media?
  • Do we see different roles over time and how do these relate to gender?
  • How do onscreen gender roles relate to the off-screen world?

The data revealed that when it comes to  gender roles on film and television, perhaps art doesn’t imitate life. The results are published in the paper entitled:  “Roles for the boys? Mining cast lists for gender and role distributions over time.” (Get a free copy here.) The researchers plan to present their work at the International World Wide Web Conference May 18 – 22, 2015.

Subscribe to this blog and receive email updates when we publish a new article.

Editor’s Note: Laurie Riedman owns Riedman Communications, a strategic marketing and public relations firm.  You can reach her at laurie@riedmancomm.com. Laurie filed this content as a paid contributor to Xerox. Some of the content is the author’s opinion and does not necessarily reflect all views of Xerox.

Related Posts