By Andrea Gao, Maximiliano Santinelli, and Steve Mills

This is the second article in a series on data bias and identification. For additional background, please refer to the first article, Getting to the Root of Data Bias in AI.

In our first article on data bias, we noted that bias can appear at any stage in the dataset lifecycle — during creation, design, sampling, collection, and processing. We also provided a few general ways to reduce such bias. In this article, we explore in greater depth a variety of techniques that can be applied to mitigate historical data bias. In…

By Maximiliano Santinelli, Sean Singer and Andrea Gao

Paradoxically, data is both a company’s most important digital asset and its most problematic.

Consider the booming business of DNA testing. For consumers interested in their ancestry and health, such tests promise to uncover vital information. Yet data bias has raised major questions about the accuracy and credibility of these tests. Some customers have received very different findings from different companies. Other customers, particularly those from ethnic and racial minorities, have found the results lack the kinds of details they are looking for.

These issues are largely due to the fact that…

Andrea Gao (she/her)

BCG GAMMA Senior Data Scientist

