Concept whitening: A strategy to improve the interpretability of image recognition models

Over the past decade or so, deep neural networks have achieved very promising results on a variety of tasks, including image recognition tasks. Despite their advantages, these networks are very complex and sophisticated, which makes interpreting what they learned and determining the processes behind their predictions difficult or sometimes impossible. This lack of interpretability makes deep neural networks somewhat untrustworthy and unreliable.

Researchers from the Prediction Analysis Lab at Duke University, led by Professor Cynthia Rudin, have recently devised a technique that could improve the interpretability of deep neural networks. This approach, called concept whitening (CW), was first introduced in a paper published in Nature Machine Intelligence.

“Rather than conducting a post hoc analysis to see inside the hidden layers of NNs, we directly alter the NN to disentangle the latent space so that the axes are aligned with known concepts,” Zhi Chen, one of the researchers who carried out the study, told Tech Xplore. “Such disentanglement can provide us with a much clearer understanding of how the network gradually learns concepts over layers. It also focuses all the information about one concept (e.g., “lamp,” “bed,” or “person”) to go through only one neuron; this is what is meant by disentanglement.”

Blog