Where is the Model Looking At?--Concentrate and Explain the Network
Attention
- URL: http://arxiv.org/abs/2009.13862v1
- Date: Tue, 29 Sep 2020 08:36:18 GMT
- Title: Where is the Model Looking At?--Concentrate and Explain the Network
Attention
- Authors: Wenjia Xu, Jiuniu Wang, Yang Wang, Guangluan Xu, Wei Dai, Yirong Wu
- Abstract summary: We propose an Explainable Attribute-based Multi-task (EAT) framework to concentrate the model attention on the discriminative image area.
We generate attribute-based textual explanations for the network and ground the attributes on the image to show visual explanations.
Results indicate that the EAT framework can give multi-modal explanations that interpret the network decision.
- Score: 21.037241523836553
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Image classification models have achieved satisfactory performance on many
datasets, sometimes even better than human. However, The model attention is
unclear since the lack of interpretability. This paper investigates the
fidelity and interpretability of model attention. We propose an Explainable
Attribute-based Multi-task (EAT) framework to concentrate the model attention
on the discriminative image area and make the attention interpretable. We
introduce attributes prediction to the multi-task learning network, helping the
network to concentrate attention on the foreground objects. We generate
attribute-based textual explanations for the network and ground the attributes
on the image to show visual explanations. The multi-model explanation can not
only improve user trust but also help to find the weakness of network and
dataset. Our framework can be generalized to any basic model. We perform
experiments on three datasets and five basic models. Results indicate that the
EAT framework can give multi-modal explanations that interpret the network
decision. The performance of several recognition approaches is improved by
guiding network attention.
Related papers
- Improving Network Interpretability via Explanation Consistency Evaluation [56.14036428778861]
We propose a framework that acquires more explainable activation heatmaps and simultaneously increase the model performance.
Specifically, our framework introduces a new metric, i.e., explanation consistency, to reweight the training samples adaptively in model learning.
Our framework then promotes the model learning by paying closer attention to those training samples with a high difference in explanations.
arXiv Detail & Related papers (2024-08-08T17:20:08Z) - Heuristic Vision Pre-Training with Self-Supervised and Supervised
Multi-Task Learning [0.0]
We propose a novel pre-training framework by adopting both self-supervised and supervised visual pre-text tasks in a multi-task manner.
Results show that our pre-trained models can deliver results on par with or better than state-of-the-art (SOTA) results on multiple visual tasks.
arXiv Detail & Related papers (2023-10-11T14:06:04Z) - Seeing in Words: Learning to Classify through Language Bottlenecks [59.97827889540685]
Humans can explain their predictions using succinct and intuitive descriptions.
We show that a vision model whose feature representations are text can effectively classify ImageNet images.
arXiv Detail & Related papers (2023-06-29T00:24:42Z) - Differentiable Outlier Detection Enable Robust Deep Multimodal Analysis [20.316056261749946]
We propose an end-to-end vision and language model incorporating explicit knowledge graphs.
We also introduce an interactive out-of-distribution layer using implicit network operator.
In practice, we apply our model on several vision and language downstream tasks including visual question answering, visual reasoning, and image-text retrieval.
arXiv Detail & Related papers (2023-02-11T05:46:21Z) - MEGAN: Multi-Explanation Graph Attention Network [1.1470070927586016]
We propose a multi-explanation graph attention network (MEGAN)
Unlike existing graph explainability methods, our network can produce node and edge attributional explanations along multiple channels.
Our attention-based network is fully differentiable and explanations can actively be trained in an explanation-supervised manner.
arXiv Detail & Related papers (2022-11-23T16:10:13Z) - Vision Models Are More Robust And Fair When Pretrained On Uncurated
Images Without Supervision [38.22842778742829]
Discriminative self-supervised learning allows training models on any random group of internet images.
We train models on billions of random images without any data pre-processing or prior assumptions about what we want the model to learn.
We extensively study and validate our model performance on over 50 benchmarks including fairness, to distribution shift, geographical diversity, fine grained recognition, image copy detection and many image classification datasets.
arXiv Detail & Related papers (2022-02-16T22:26:47Z) - Object-Centric Diagnosis of Visual Reasoning [118.36750454795428]
This paper presents a systematical object-centric diagnosis of visual reasoning on grounding and robustness.
We develop a diagnostic model, namely Graph Reasoning Machine.
Our model replaces purely symbolic visual representation with probabilistic scene graph and then applies teacher-forcing training for the visual reasoning module.
arXiv Detail & Related papers (2020-12-21T18:59:28Z) - Explain by Evidence: An Explainable Memory-based Neural Network for
Question Answering [41.73026155036886]
This paper proposes an explainable, evidence-based memory network architecture.
It learns to summarize the dataset and extract supporting evidences to make its decision.
Our model achieves state-of-the-art performance on two popular question answering datasets.
arXiv Detail & Related papers (2020-11-05T21:18:21Z) - Understanding the Role of Individual Units in a Deep Neural Network [85.23117441162772]
We present an analytic framework to systematically identify hidden units within image classification and image generation networks.
First, we analyze a convolutional neural network (CNN) trained on scene classification and discover units that match a diverse set of object concepts.
Second, we use a similar analytic method to analyze a generative adversarial network (GAN) model trained to generate scenes.
arXiv Detail & Related papers (2020-09-10T17:59:10Z) - Focus Longer to See Better:Recursively Refined Attention for
Fine-Grained Image Classification [148.4492675737644]
Deep Neural Network has shown great strides in the coarse-grained image classification task.
In this paper, we try to focus on these marginal differences to extract more representative features.
Our network repetitively focuses on parts of images to spot small discriminative parts among the classes.
arXiv Detail & Related papers (2020-05-22T03:14:18Z) - Learning What Makes a Difference from Counterfactual Examples and
Gradient Supervision [57.14468881854616]
We propose an auxiliary training objective that improves the generalization capabilities of neural networks.
We use pairs of minimally-different examples with different labels, a.k.a counterfactual or contrasting examples, which provide a signal indicative of the underlying causal structure of the task.
Models trained with this technique demonstrate improved performance on out-of-distribution test sets.
arXiv Detail & Related papers (2020-04-20T02:47:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.