Related papers: B-cosification: Transforming Deep Neural Networks to be Inherently Interpretable

B-cosification: Transforming Deep Neural Networks to be Inherently Interpretable

URL: http://arxiv.org/abs/2411.00715v1
Date: Fri, 01 Nov 2024 16:28:11 GMT
Title: B-cosification: Transforming Deep Neural Networks to be Inherently Interpretable
Authors: Shreyash Arya, Sukrut Rao, Moritz Böhle, Bernt Schiele,
Abstract summary: 'B-cosification' is a novel approach to transform existing pre-trained models to become inherently interpretable. We find that B-cosification can yield models that are on par with B-cos models trained from scratch in terms of interpretability.
Score: 53.848005910548565
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: B-cos Networks have been shown to be effective for obtaining highly human interpretable explanations of model decisions by architecturally enforcing stronger alignment between inputs and weight. B-cos variants of convolutional networks (CNNs) and vision transformers (ViTs), which primarily replace linear layers with B-cos transformations, perform competitively to their respective standard variants while also yielding explanations that are faithful by design. However, it has so far been necessary to train these models from scratch, which is increasingly infeasible in the era of large, pre-trained foundation models. In this work, inspired by the architectural similarities in standard DNNs and B-cos networks, we propose 'B-cosification', a novel approach to transform existing pre-trained models to become inherently interpretable. We perform a thorough study of design choices to perform this conversion, both for convolutional neural networks and vision transformers. We find that B-cosification can yield models that are on par with B-cos models trained from scratch in terms of interpretability, while often outperforming them in terms of classification performance at a fraction of the training cost. Subsequently, we apply B-cosification to a pretrained CLIP model, and show that, even with limited data and compute cost, we obtain a B-cosified version that is highly interpretable and competitive on zero shot performance across a variety of datasets. We release our code and pre-trained model weights at https://github.com/shrebox/B-cosification.

Related papers

B-cos LM: Efficiently Transforming Pre-trained Language Models for Improved Explainability [21.480463138209483]
Post-hoc explanation methods for black-box models often struggle with faithfulness and human interpretability. We introduce B-cos LMs, i.e., B-cos networks empowered for NLP tasks. Our approach directly transforms pre-trained language models into B-cos LMs by combining B-cos conversion and task fine-tuning.
arXiv Detail & Related papers (2025-02-18T16:13:08Z)
Transferable Post-training via Inverse Value Learning [83.75002867411263]
We propose modeling changes at the logits level during post-training using a separate neural network (i.e., the value network) After training this network on a small base model using demonstrations, this network can be seamlessly integrated with other pre-trained models during inference. We demonstrate that the resulting value network has broad transferability across pre-trained models of different parameter sizes.
arXiv Detail & Related papers (2024-10-28T13:48:43Z)
B-cos Alignment for Inherently Interpretable CNNs and Vision Transformers [97.75725574963197]
We present a new direction for increasing the interpretability of deep neural networks (DNNs) by promoting weight-input alignment during training. We show that a sequence of such transformations induces a single linear transformation that faithfully summarises the full model computations. We show that the resulting explanations are of high visual quality and perform well under quantitative interpretability metrics.
arXiv Detail & Related papers (2023-06-19T12:54:28Z)
State-driven Implicit Modeling for Sparsity and Robustness in Neural Networks [3.604879434384177]
We present a new approach to training implicit models, called State-driven Implicit Modeling (SIM) SIM constrains the internal states and outputs to match that of a baseline model, circumventing costly backward computations. We demonstrate how the SIM approach can be applied to significantly improve sparsity and robustness of baseline models trained on datasets.
arXiv Detail & Related papers (2022-09-19T23:58:48Z)
B-cos Networks: Alignment is All We Need for Interpretability [136.27303006772294]
We present a new direction for increasing the interpretability of deep neural networks (DNNs) by promoting weight-input alignment during training. A B-cos transform induces a single linear transform that faithfully summarises the full model computations. We show that it can easily be integrated into common models such as VGGs, ResNets, InceptionNets, and DenseNets.
arXiv Detail & Related papers (2022-05-20T16:03:29Z)
Semantic Correspondence with Transformers [68.37049687360705]
We propose Cost Aggregation with Transformers (CATs) to find dense correspondences between semantically similar images. We include appearance affinity modelling to disambiguate the initial correlation maps and multi-level aggregation. We conduct experiments to demonstrate the effectiveness of the proposed model over the latest methods and provide extensive ablation studies.
arXiv Detail & Related papers (2021-06-04T14:39:03Z)
On Resource-Efficient Bayesian Network Classifiers and Deep Neural Networks [14.540226579203207]
We present two methods to reduce the complexity of Bayesian network (BN) classifiers. First, we introduce quantization-aware training using the straight-through gradient estimator to quantize the parameters of BNs to few bits. Second, we extend a recently proposed differentiable tree-augmented naive Bayes (TAN) structure learning approach by also considering the model size.
arXiv Detail & Related papers (2020-10-22T14:47:55Z)
Belief Propagation Reloaded: Learning BP-Layers for Labeling Problems [83.98774574197613]
We take one of the simplest inference methods, a truncated max-product Belief propagation, and add what is necessary to make it a proper component of a deep learning model. This BP-Layer can be used as the final or an intermediate block in convolutional neural networks (CNNs) The model is applicable to a range of dense prediction problems, is well-trainable and provides parameter-efficient and robust solutions in stereo, optical flow and semantic segmentation.
arXiv Detail & Related papers (2020-03-13T13:11:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.