Neuro-Symbolic Spatial Reasoning in Segmentation
- URL: http://arxiv.org/abs/2510.15841v1
- Date: Fri, 17 Oct 2025 17:35:34 GMT
- Title: Neuro-Symbolic Spatial Reasoning in Segmentation
- Authors: Jiayi Lin, Jiabo Huang, Shaogang Gong,
- Abstract summary: Open-Vocabulary Semantic (OVSS) assigns pixel-level labels from an open set of categories, requiring to unseen and unlabelled objects.<n>We introduce neuro-symbolic (NeSy) spatial reasoning in OVSS.<n>This is the first attempt to explore NeSy spatial reasoning in OVSS.
- Score: 27.7231614319754
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Open-Vocabulary Semantic Segmentation (OVSS) assigns pixel-level labels from an open set of categories, requiring generalization to unseen and unlabelled objects. Using vision-language models (VLMs) to correlate local image patches with potential unseen object categories suffers from a lack of understanding of spatial relations of objects in a scene. To solve this problem, we introduce neuro-symbolic (NeSy) spatial reasoning in OVSS. In contrast to contemporary VLM correlation-based approaches, we propose Relational Segmentor (RelateSeg) to impose explicit spatial relational constraints by first order logic (FOL) formulated in a neural network architecture. This is the first attempt to explore NeSy spatial reasoning in OVSS. Specifically, RelateSeg automatically extracts spatial relations, e.g., <cat, to-right-of, person>, and encodes them as first-order logic formulas using our proposed pseudo categories. Each pixel learns to predict both a semantic category (e.g., "cat") and a spatial pseudo category (e.g., "right of person") simultaneously, enforcing relational constraints (e.g., a "cat" pixel must lie to the right of a "person"). Finally, these logic constraints are formulated in a deep network architecture by fuzzy logic relaxation, enabling end-to-end learning of spatial-relationally consistent segmentation. RelateSeg achieves state-of-the-art performance in terms of average mIoU across four benchmark datasets and particularly shows clear advantages on images containing multiple categories, with the cost of only introducing a single auxiliary loss function and no additional parameters, validating the effectiveness of NeSy spatial reasoning in OVSS.
Related papers
- SOM-VQ: Topology-Aware Tokenization for Interactive Generative Models [41.99844472131922]
We introduce SOM-VQ, a tokenization method that combines vector quantization with Self-Organizing Maps to learn discrete codebooks.<n> SOM-VQ produces more learnable token sequences while providing an explicit geometry in code space.<n>We focus on human motion generation - a domain where kinematic structure, smooth temporal continuity, and interactive use cases make topology-aware control especially natural.
arXiv Detail & Related papers (2026-02-24T17:29:04Z) - LOGICSEG: Parsing Visual Semantics with Neural Logic Learning and
Reasoning [73.98142349171552]
LOGICSEG is a holistic visual semantic that integrates neural inductive learning and logic reasoning with both rich data and symbolic knowledge.
During fuzzy logic-based continuous relaxation, logical formulae are grounded onto data and neural computational graphs, hence enabling logic-induced network training.
These designs together make LOGICSEG a general and compact neural-logic machine that is readily integrated into existing segmentation models.
arXiv Detail & Related papers (2023-09-24T05:43:19Z) - SimNP: Learning Self-Similarity Priors Between Neural Points [52.4201466988562]
SimNP is a method to learn category-level self-similarities.
We show that SimNP is able to outperform previous methods in reconstructing symmetric unseen object regions.
arXiv Detail & Related papers (2023-09-07T16:02:40Z) - Learning Feature Matching via Matchable Keypoint-Assisted Graph Neural
Network [52.29330138835208]
Accurately matching local features between a pair of images is a challenging computer vision task.
Previous studies typically use attention based graph neural networks (GNNs) with fully-connected graphs over keypoints within/across images.
We propose MaKeGNN, a sparse attention-based GNN architecture which bypasses non-repeatable keypoints and leverages matchable ones to guide message passing.
arXiv Detail & Related papers (2023-07-04T02:50:44Z) - Spatial Correspondence between Graph Neural Network-Segmented Images [1.807691213023136]
Graph neural networks (GNNs) have been proposed for medical image segmentation.
This work explores the potentials in these GNNs with common topology for establishing spatial correspondence.
With an example application of registering local vertebral sub-regions found in CT images, our experimental results showed that the GNN-based segmentation is capable of accurate and reliable localization.
arXiv Detail & Related papers (2023-03-12T03:25:01Z) - Automated Feature-Topic Pairing: Aligning Semantic and Embedding Spaces
in Spatial Representation Learning [28.211312371895]
This paper formulates a new problem: feature-topic pairing, and proposes a novel Particle Swarm Optimization (PSO) based deep learning framework.
Specifically, we formulate the problem into an automated alignment task between 1) a latent embedding feature space and 2) a semantic topic space.
We design a PSO based solver to simultaneously select an optimal set of topics and learn corresponding features based on the selected topics.
arXiv Detail & Related papers (2021-09-22T21:55:36Z) - Learning Spatial Context with Graph Neural Network for Multi-Person Pose
Grouping [71.59494156155309]
Bottom-up approaches for image-based multi-person pose estimation consist of two stages: keypoint detection and grouping.
In this work, we formulate the grouping task as a graph partitioning problem, where we learn the affinity matrix with a Graph Neural Network (GNN)
The learned geometry-based affinity is further fused with appearance-based affinity to achieve robust keypoint association.
arXiv Detail & Related papers (2021-04-06T09:21:14Z) - Towards Efficient Scene Understanding via Squeeze Reasoning [71.1139549949694]
We propose a novel framework called Squeeze Reasoning.
Instead of propagating information on the spatial map, we first learn to squeeze the input feature into a channel-wise global vector.
We show that our approach can be modularized as an end-to-end trained block and can be easily plugged into existing networks.
arXiv Detail & Related papers (2020-11-06T12:17:01Z) - High-Order Information Matters: Learning Relation and Topology for
Occluded Person Re-Identification [84.43394420267794]
We propose a novel framework by learning high-order relation and topology information for discriminative features and robust alignment.
Our framework significantly outperforms state-of-the-art by6.5%mAP scores on Occluded-Duke dataset.
arXiv Detail & Related papers (2020-03-18T12:18:35Z) - Optimized Feature Space Learning for Generating Efficient Binary Codes
for Image Retrieval [9.470008343329892]
We propose an approach for learning low dimensional optimized feature space with minimum intra-class variance and maximum inter-class variance.
We binarize our generated feature vectors with the popular Iterative Quantization (ITQ) approach and also propose an ensemble network to generate binary codes of desired bit length for image retrieval.
arXiv Detail & Related papers (2020-01-30T15:30:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.