Modern businesses want to use machine learning and data mining to make decisions based on what their data tells them, but the very nature of that inquiry is discriminatory. Well-intentioned organizations try to rectify or overcompensate for this by eliminating bias in machine learning models. What they don’t realize is that in doing so, it can mess things up further. Why is this? Once you get into removing data categories, other components, characteristics, or traits sneak in.
Suppose, for example, you uncover that income is biasing your model, but there is also a correlation between income and where someone comes from (wages vary by geography). The moment you add income into the model, you need to de-discriminate that by putting origin in as well. It’s extremely hard to make sure that you have nothing discriminatory in the model. If you take out where someone comes from, how much they earn, where they live, and maybe what their education is, there’s not much left to allow you to determine the difference between one person to another. And still, there could be some remaining bias you haven’t thought about.
Read the full article at Venture Beat