Fill in the blanks: Rethinking Interpretability in vision
- URL: http://arxiv.org/abs/2411.10273v1
- Date: Fri, 15 Nov 2024 15:31:06 GMT
- Title: Fill in the blanks: Rethinking Interpretability in vision
- Authors: Pathirage N. Deelaka, Tharindu Wickremasinghe, Devin Y. De Silva, Lisara N. Gajaweera,
- Abstract summary: We re-think vision-model explainability from a novel perspective, to probe the general input structure that a model has learnt during its training.
Experiments on standard vision datasets and pre-trained models reveal consistent patterns, and could be intergrated as an additional model-agnostic explainability tool.
- Score: 0.0
- License:
- Abstract: Model interpretability is a key challenge that has yet to align with the advancements observed in contemporary state-of-the-art deep learning models. In particular, deep learning aided vision tasks require interpretability, in order for their adoption in more specialized domains such as medical imaging. Although the field of explainable AI (XAI) developed methods for interpreting vision models along with early convolutional neural networks, recent XAI research has mainly focused on assigning attributes via saliency maps. As such, these methods are restricted to providing explanations at a sample level, and many explainability methods suffer from low adaptability across a wide range of vision models. In our work, we re-think vision-model explainability from a novel perspective, to probe the general input structure that a model has learnt during its training. To this end, we ask the question: "How would a vision model fill-in a masked-image". Experiments on standard vision datasets and pre-trained models reveal consistent patterns, and could be intergrated as an additional model-agnostic explainability tool in modern machine-learning platforms. The code will be available at \url{https://github.com/BoTZ-TND/FillingTheBlanks.git}
Related papers
- Autoregressive Models in Vision: A Survey [119.23742136065307]
This survey comprehensively examines the literature on autoregressive models applied to vision.
We divide visual autoregressive models into three general sub-categories, including pixel-based, token-based, and scale-based models.
We present a multi-faceted categorization of autoregressive models in computer vision, including image generation, video generation, 3D generation, and multi-modal generation.
arXiv Detail & Related papers (2024-11-08T17:15:12Z) - Data-efficient Large Vision Models through Sequential Autoregression [58.26179273091461]
We develop an efficient, autoregression-based vision model on a limited dataset.
We demonstrate how this model achieves proficiency in a spectrum of visual tasks spanning both high-level and low-level semantic understanding.
Our empirical evaluations underscore the model's agility in adapting to various tasks, heralding a significant reduction in the parameter footprint.
arXiv Detail & Related papers (2024-02-07T13:41:53Z) - Sequential Modeling Enables Scalable Learning for Large Vision Models [120.91839619284431]
We introduce a novel sequential modeling approach which enables learning a Large Vision Model (LVM) without making use of any linguistic data.
We define a common format, "visual sentences", in which we can represent raw images and videos as well as annotated data sources.
arXiv Detail & Related papers (2023-12-01T18:59:57Z) - Foundational Models Defining a New Era in Vision: A Survey and Outlook [151.49434496615427]
Vision systems to see and reason about the compositional nature of visual scenes are fundamental to understanding our world.
The models learned to bridge the gap between such modalities coupled with large-scale training data facilitate contextual reasoning, generalization, and prompt capabilities at test time.
The output of such models can be modified through human-provided prompts without retraining, e.g., segmenting a particular object by providing a bounding box, having interactive dialogues by asking questions about an image or video scene or manipulating the robot's behavior through language instructions.
arXiv Detail & Related papers (2023-07-25T17:59:18Z) - Seeing in Words: Learning to Classify through Language Bottlenecks [59.97827889540685]
Humans can explain their predictions using succinct and intuitive descriptions.
We show that a vision model whose feature representations are text can effectively classify ImageNet images.
arXiv Detail & Related papers (2023-06-29T00:24:42Z) - Localization vs. Semantics: Visual Representations in Unimodal and
Multimodal Models [57.08925810659545]
We conduct a comparative analysis of the visual representations in existing vision-and-language models and vision-only models.
Our empirical observations suggest that vision-and-language models are better at label prediction tasks.
We hope our study sheds light on the role of language in visual learning, and serves as an empirical guide for various pretrained models.
arXiv Detail & Related papers (2022-12-01T05:00:18Z) - Greybox XAI: a Neural-Symbolic learning framework to produce
interpretable predictions for image classification [6.940242990198]
Greybox XAI is a framework that composes a DNN and a transparent model thanks to the use of a symbolic Knowledge Base (KB)
We address the problem of the lack of universal criteria for XAI by formalizing what an explanation is.
We show how this new architecture is accurate and explainable in several datasets.
arXiv Detail & Related papers (2022-09-26T08:55:31Z) - Perception Visualization: Seeing Through the Eyes of a DNN [5.9557391359320375]
We develop a new form of explanation that is radically different in nature from current explanation methods, such as Grad-CAM.
Perception visualization provides a visual representation of what the DNN perceives in the input image by depicting what visual patterns the latent representation corresponds to.
Results of our user study demonstrate that humans can better understand and predict the system's decisions when perception visualizations are available.
arXiv Detail & Related papers (2022-04-21T07:18:55Z) - Towards Learning a Vocabulary of Visual Concepts and Operators using
Deep Neural Networks [0.0]
We analyze the learned feature maps of trained models using MNIST images for achieving more explainable predictions.
We illustrate the idea by generating visual concepts from a Variational Autoencoder trained using MNIST images.
We were able to reduce the reconstruction loss (mean square error) from an initial value of 120 without augmentation to 60 with augmentation.
arXiv Detail & Related papers (2021-09-01T16:34:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.