Concept-RuleNet: Grounded Multi-Agent Neurosymbolic Reasoning in Vision Language Models
- URL: http://arxiv.org/abs/2511.11751v1
- Date: Thu, 13 Nov 2025 18:13:56 GMT
- Title: Concept-RuleNet: Grounded Multi-Agent Neurosymbolic Reasoning in Vision Language Models
- Authors: Sanchit Sinha, Guangzhi Xiong, Zhenghao He, Aidong Zhang,
- Abstract summary: Concept-RuleNet is a multi-agent system that reinstates visual grounding while retaining transparent reasoning.<n>Our system augments state-of-the-art neurosymbolic baselines by an average of 5% while also reducing the occurrence of hallucinated symbols in rules by up to 50%.
- Score: 41.6338086518055
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Modern vision-language models (VLMs) deliver impressive predictive accuracy yet offer little insight into 'why' a decision is reached, frequently hallucinating facts, particularly when encountering out-of-distribution data. Neurosymbolic frameworks address this by pairing black-box perception with interpretable symbolic reasoning, but current methods extract their symbols solely from task labels, leaving them weakly grounded in the underlying visual data. In this paper, we introduce a multi-agent system - Concept-RuleNet that reinstates visual grounding while retaining transparent reasoning. Specifically, a multimodal concept generator first mines discriminative visual concepts directly from a representative subset of training images. Next, these visual concepts are utilized to condition symbol discovery, anchoring the generations in real image statistics and mitigating label bias. Subsequently, symbols are composed into executable first-order rules by a large language model reasoner agent - yielding interpretable neurosymbolic rules. Finally, during inference, a vision verifier agent quantifies the degree of presence of each symbol and triggers rule execution in tandem with outputs of black-box neural models, predictions with explicit reasoning pathways. Experiments on five benchmarks, including two challenging medical-imaging tasks and three underrepresented natural-image datasets, show that our system augments state-of-the-art neurosymbolic baselines by an average of 5% while also reducing the occurrence of hallucinated symbols in rules by up to 50%.
Related papers
- BrainExplore: Large-Scale Discovery of Interpretable Visual Representations in the Human Brain [33.91441575463702]
We present a large-scale, automated framework for discovering and explaining visual representations across the human cortex.<n>Our method comprises two main stages. First, we discover candidate interpretable patterns in fMRI activity through unsupervised, data-driven decomposition methods.<n>Next, we explain each pattern by identifying the set of natural images that most strongly elicit it and generating a natural-language description of their shared visual meaning.
arXiv Detail & Related papers (2025-12-09T13:01:17Z) - CLMN: Concept based Language Models via Neural Symbolic Reasoning [27.255064617527328]
Concept Language Model Network (CLMN) is a neural-symbolic framework that keeps both performance and interpretability.<n> CLMN represents concepts as continuous, human-readable embeddings.<n>Model augments original text features with concept-aware representations and automatically induces interpretable logic rules.
arXiv Detail & Related papers (2025-10-11T06:58:44Z) - Concept-Guided Interpretability via Neural Chunking [64.6429903327095]
We show that neural networks exhibit patterns in their raw population activity that mirror regularities in the training data.<n>We propose three methods to extract recurring chunks on a neural population level.<n>Our work points to a new direction for interpretability, one that harnesses both cognitive principles and the structure of naturalistic data.
arXiv Detail & Related papers (2025-05-16T13:49:43Z) - Symbolic Rule Extraction from Attention-Guided Sparse Representations in Vision Transformers [1.3812010983144802]
Recent neuro-symbolic approaches have successfully extracted symbolic rule-sets from CNN-based models to enhance interpretability.<n>We propose a framework for symbolic rule extraction from Vision Transformers (ViTs) by introducing a sparse concept layer inspired by Sparse Autoencoders (SAEs)<n>Our method achieves a 5.14% better classification accuracy than the standard ViT while enabling symbolic reasoning.
arXiv Detail & Related papers (2025-05-10T19:45:15Z) - Neural Language of Thought Models [18.930227757853313]
We introduce the Neural Language of Thought Model (NLoTM), a novel approach for unsupervised learning of LoTH-inspired representation and generation.
NLoTM comprises two key components: (1) the Semantic Vector-Quantized Variational Autoencoder, which learns hierarchical, composable discrete representations aligned with objects and their properties, and (2) the Autoregressive LoT Prior, an autoregressive transformer that learns to generate semantic concept tokens compositionally.
We evaluate NLoTM on several 2D and 3D image datasets, demonstrating superior performance in downstream tasks, out-of-distribution generalization, and image generation
arXiv Detail & Related papers (2024-02-02T08:13:18Z) - Symbol-LLM: Leverage Language Models for Symbolic System in Visual Human
Activity Reasoning [58.5857133154749]
We propose a new symbolic system with broad-coverage symbols and rational rules.
We leverage the recent advancement of LLMs as an approximation of the two ideal properties.
Our method shows superiority in extensive activity understanding tasks.
arXiv Detail & Related papers (2023-11-29T05:27:14Z) - Generating by Understanding: Neural Visual Generation with Logical Symbol Groundings [23.85885099230917]
We propose the Abductive visual Generation (AbdGen) approach to build such logic-integrated models.<n>We experimentally show that our approach can be utilized to integrate various neural generative models with logical reasoning systems.
arXiv Detail & Related papers (2023-10-26T15:00:21Z) - LOGICSEG: Parsing Visual Semantics with Neural Logic Learning and
Reasoning [73.98142349171552]
LOGICSEG is a holistic visual semantic that integrates neural inductive learning and logic reasoning with both rich data and symbolic knowledge.
During fuzzy logic-based continuous relaxation, logical formulae are grounded onto data and neural computational graphs, hence enabling logic-induced network training.
These designs together make LOGICSEG a general and compact neural-logic machine that is readily integrated into existing segmentation models.
arXiv Detail & Related papers (2023-09-24T05:43:19Z) - Bridging Neural and Symbolic Representations with Transitional Dictionary Learning [4.326886488307076]
This paper introduces a novel Transitional Dictionary Learning (TDL) framework that can implicitly learn symbolic knowledge.<n>We propose a game-theoretic diffusion model to decompose the input into visual parts using the dictionaries learned by the Expectation Maximization (EM) algorithm.<n> Experiments are conducted on three abstract compositional visual object datasets.
arXiv Detail & Related papers (2023-08-03T19:29:35Z) - Minding Language Models' (Lack of) Theory of Mind: A Plug-and-Play
Multi-Character Belief Tracker [72.09076317574238]
ToM is a plug-and-play approach to investigate the belief states of characters in reading comprehension.
We show that ToM enhances off-the-shelf neural network theory mind in a zero-order setting while showing robust out-of-distribution performance compared to supervised baselines.
arXiv Detail & Related papers (2023-06-01T17:24:35Z) - Expressive Explanations of DNNs by Combining Concept Analysis with ILP [0.3867363075280543]
We use inherent features learned by the network to build a global, expressive, verbal explanation of the rationale of a feed-forward convolutional deep neural network (DNN)
We show that our explanation is faithful to the original black-box model.
arXiv Detail & Related papers (2021-05-16T07:00:27Z) - A Minimalist Dataset for Systematic Generalization of Perception,
Syntax, and Semantics [131.93113552146195]
We present a new dataset, Handwritten arithmetic with INTegers (HINT), to examine machines' capability of learning generalizable concepts.
In HINT, machines are tasked with learning how concepts are perceived from raw signals such as images.
We undertake extensive experiments with various sequence-to-sequence models, including RNNs, Transformers, and GPT-3.
arXiv Detail & Related papers (2021-03-02T01:32:54Z) - Compositional Explanations of Neurons [52.71742655312625]
We describe a procedure for explaining neurons in deep representations by identifying compositional logical concepts.
We use this procedure to answer several questions on interpretability in models for vision and natural language processing.
arXiv Detail & Related papers (2020-06-24T20:37:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.