Related papers: Learning Interpretable Logic Rules from Deep Vision Models

Learning Interpretable Logic Rules from Deep Vision Models

URL: http://arxiv.org/abs/2503.10547v1
Date: Thu, 13 Mar 2025 17:04:04 GMT
Title: Learning Interpretable Logic Rules from Deep Vision Models
Authors: Chuqin Geng, Yuhe Jiang, Ziyu Zhao, Haolin Ye, Zhaoyue Wang, Xujie Si,
Abstract summary: VisionLogic is a framework to extract interpretable logic rules from deep vision models.<n>It provides local explanations for single images and global explanations for specific classes.<n> VisionLogic also facilitates the study of visual concepts encoded by predicates.
Score: 6.854329442341952
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We propose a general framework called VisionLogic to extract interpretable logic rules from deep vision models, with a focus on image classification tasks. Given any deep vision model that uses a fully connected layer as the output head, VisionLogic transforms neurons in the last layer into predicates and grounds them into vision concepts using causal validation. In this way, VisionLogic can provide local explanations for single images and global explanations for specific classes in the form of logic rules. Compared to existing interpretable visualization tools such as saliency maps, VisionLogic addresses several key challenges, including the lack of causal explanations, overconfidence in visualizations, and ambiguity in interpretation. VisionLogic also facilitates the study of visual concepts encoded by predicates, particularly how they behave under perturbation -- an area that remains underexplored in the field of hidden semantics. Apart from providing better visual explanations and insights into the visual concepts learned by the model, we show that VisionLogic retains most of the neural network's discriminative power in an interpretable and transparent manner. We envision it as a bridge between complex model behavior and human-understandable explanations, providing trustworthy and actionable insights for real-world applications.

Related papers

Object Concepts Emerge from Motion [24.73461163778215]
We propose a biologically inspired framework for learning object-centric visual representations in an unsupervised manner.<n>Our key insight is that motion boundary serves as a strong signal for object-level grouping.<n>Our framework is fully label-free and does not rely on camera calibration, making it scalable to large-scale unstructured video data.
arXiv Detail & Related papers (2025-05-27T18:09:02Z)
A Cognitive Paradigm Approach to Probe the Perception-Reasoning Interface in VLMs [3.2228025627337864]
This paper introduces a structured evaluation framework using Bongard Problems (BPs) to dissect the perception-reasoning interface in Vision-Language Models (VLMs) We propose three distinct evaluation paradigms, mirroring human problem-solving strategies. Our framework provides a valuable diagnostic tool, highlighting the need to enhance visual processing fidelity for achieving more robust and human-like visual intelligence in AI.
arXiv Detail & Related papers (2025-01-23T12:42:42Z)
DRIVINGVQA: Analyzing Visual Chain-of-Thought Reasoning of Vision Language Models in Real-World Scenarios with Driving Theory Tests [69.00444996464662]
We present DrivingVQA, a new benchmark derived from driving theory tests to evaluate visual chain-of-thought reasoning in complex real-world scenarios.<n>Our experiments reveal that open-source and proprietary LVLMs struggle with visual chain-of-thought reasoning under zero-shot settings.<n>We investigate training strategies that leverage relevant entities to improve visual reasoning.
arXiv Detail & Related papers (2025-01-08T18:31:16Z)
Do Vision-Language Models Really Understand Visual Language? [43.893398898373995]
Diagrams are a typical example of a visual language depicting complex concepts and their relationships in the form of an image.<n>Recent studies suggest that Large Vision-Language Models (LVLMs) can even tackle complex reasoning tasks involving diagrams.<n>This paper develops a comprehensive test suite to evaluate the diagram comprehension capability of LVLMs.
arXiv Detail & Related papers (2024-09-30T19:45:11Z)
What Makes a Maze Look Like a Maze? [92.80800000328277]
We introduce Deep Grounding (DSG), a framework that leverages explicit structured representations of visual abstractions for grounding and reasoning.<n>At the core of DSG are schemas--dependency graph descriptions of abstract concepts that decompose them into more primitive-level symbols.<n>We show that DSG significantly improves the abstract visual reasoning performance of vision-language models.
arXiv Detail & Related papers (2024-09-12T16:41:47Z)
Advancing Ante-Hoc Explainable Models through Generative Adversarial Networks [24.45212348373868]
This paper presents a novel concept learning framework for enhancing model interpretability and performance in visual classification tasks. Our approach appends an unsupervised explanation generator to the primary classifier network and makes use of adversarial training. This work presents a significant step towards building inherently interpretable deep vision models with task-aligned concept representations.
arXiv Detail & Related papers (2024-01-09T16:16:16Z)
LOGICSEG: Parsing Visual Semantics with Neural Logic Learning and Reasoning [73.98142349171552]
LOGICSEG is a holistic visual semantic that integrates neural inductive learning and logic reasoning with both rich data and symbolic knowledge. During fuzzy logic-based continuous relaxation, logical formulae are grounded onto data and neural computational graphs, hence enabling logic-induced network training. These designs together make LOGICSEG a general and compact neural-logic machine that is readily integrated into existing segmentation models.
arXiv Detail & Related papers (2023-09-24T05:43:19Z)
DeViL: Decoding Vision features into Language [53.88202366696955]
Post-hoc explanation methods have often been criticised for abstracting away the decision-making process of deep neural networks. In this work, we would like to provide natural language descriptions for what different layers of a vision backbone have learned. We train a transformer network to translate individual image features of any vision layer into a prompt that a separate off-the-shelf language model decodes into natural language.
arXiv Detail & Related papers (2023-09-04T13:59:55Z)
Does Visual Pretraining Help End-to-End Reasoning? [81.4707017038019]
We investigate whether end-to-end learning of visual reasoning can be achieved with general-purpose neural networks. We propose a simple and general self-supervised framework which "compresses" each video frame into a small set of tokens. We observe that pretraining is essential to achieve compositional generalization for end-to-end visual reasoning.
arXiv Detail & Related papers (2023-07-17T14:08:38Z)
Formal Conceptual Views in Neural Networks [0.0]
We introduce two notions for conceptual views of a neural network, specifically a many-valued and a symbolic view. We test the conceptual expressivity of our novel views through different experiments on the ImageNet and Fruit-360 data sets. We demonstrate how conceptual views can be applied for abductive learning of human comprehensible rules from neurons.
arXiv Detail & Related papers (2022-09-27T16:38:24Z)
Visual Probing: Cognitive Framework for Explaining Self-Supervised Image Representations [12.485001250777248]
Recently introduced self-supervised methods for image representation learning provide on par or superior results to their fully supervised competitors. Motivated by this observation, we introduce a novel visual probing framework for explaining the self-supervised models. We show the effectiveness and applicability of those analogs in the context of explaining self-supervised representations.
arXiv Detail & Related papers (2021-06-21T12:40:31Z)
Interpretable Visual Reasoning via Induced Symbolic Space [75.95241948390472]
We study the problem of concept induction in visual reasoning, i.e., identifying concepts and their hierarchical relationships from question-answer pairs associated with images. We first design a new framework named object-centric compositional attention model (OCCAM) to perform the visual reasoning task with object-level visual features. We then come up with a method to induce concepts of objects and relations using clues from the attention patterns between objects' visual features and question words.
arXiv Detail & Related papers (2020-11-23T18:21:49Z)
Self-supervised Learning from a Multi-view Perspective [121.63655399591681]
We show that self-supervised representations can extract task-relevant information and discard task-irrelevant information. Our theoretical framework paves the way to a larger space of self-supervised learning objective design.
arXiv Detail & Related papers (2020-06-10T00:21:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.