Related papers: Reasoning Segmentation for Images and Videos: A Survey

Reasoning Segmentation for Images and Videos: A Survey

URL: http://arxiv.org/abs/2505.18816v1
Date: Sat, 24 May 2025 18:23:14 GMT
Title: Reasoning Segmentation for Images and Videos: A Survey
Authors: Yiqing Shen, Chenjia Li, Fei Xiong, Jeong-O Jeong, Tianpeng Wang, Michael Latman, Mathias Unberath,
Abstract summary: Reasoning (RS) aims to delineate objects based on implicit text queries.<n>RS bridges the gap between visual perception and human-like reasoning capabilities.
Score: 8.73974749874605
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Reasoning Segmentation (RS) aims to delineate objects based on implicit text queries, the interpretation of which requires reasoning and knowledge integration. Unlike the traditional formulation of segmentation problems that relies on fixed semantic categories or explicit prompting, RS bridges the gap between visual perception and human-like reasoning capabilities, facilitating more intuitive human-AI interaction through natural language. Our work presents the first comprehensive survey of RS for image and video processing, examining 26 state-of-the-art methods together with a review of the corresponding evaluation metrics, as well as 29 datasets and benchmarks. We also explore existing applications of RS across diverse domains and identify their potential extensions. Finally, we identify current research gaps and highlight promising future directions.

Related papers

EgoPrompt: Prompt Learning for Egocentric Action Recognition [49.12318087940015]
EgoPrompt is a prompt learning-based framework to conduct egocentric action recognition task.<n>EgoPrompt achieves state-of-the-art performance across within-dataset, cross-dataset, and base-to-novel generalization benchmarks.
arXiv Detail & Related papers (2025-08-05T09:47:07Z)
Enhancing Spatial Reasoning in Multimodal Large Language Models through Reasoning-based Segmentation [50.81551581148339]
We introduce Relevant Reasoning (R$2$S), a reasoning-based segmentation framework.<n>We also introduce 3D ReasonSeg, a reasoning-based segmentation dataset.<n>Both experiments demonstrate that the R$2$S and 3D ReasonSeg effectively endow 3D point cloud perception with stronger spatial reasoning capabilities.
arXiv Detail & Related papers (2025-06-29T06:58:08Z)
Show or Tell? A Benchmark To Evaluate Visual and Textual Prompts in Semantic Segmentation [7.564378015102302]
We present a novel benchmark specifically designed to evaluate both visual and textual prompts for semantic segmentation.<n>We evaluate 5 open-vocabulary methods and 4 visual reference prompt approaches, adapting the latter to handle multi-class segmentation.<n>Our experiments reveal that open-vocabulary methods excel with common concepts easily described by text but struggle with complex domains like tools.
arXiv Detail & Related papers (2025-05-06T20:15:30Z)
On the Element-Wise Representation and Reasoning in Zero-Shot Image Recognition: A Systematic Survey [82.49623756124357]
Zero-shot image recognition (ZSIR) aims to recognize and reason in unseen domains by learning generalized knowledge from limited data.<n>This paper thoroughly investigates recent advances in element-wise ZSIR and provides a basis for its future development.
arXiv Detail & Related papers (2024-08-09T05:49:21Z)
Visual Prompt Selection for In-Context Learning Segmentation [77.15684360470152]
In this paper, we focus on rethinking and improving the example selection strategy. We first demonstrate that ICL-based segmentation models are sensitive to different contexts. Furthermore, empirical evidence indicates that the diversity of contextual prompts plays a crucial role in guiding segmentation.
arXiv Detail & Related papers (2024-07-14T15:02:54Z)
Explainable AI (XAI) in Image Segmentation in Medicine, Industry, and Beyond: A Survey [0.10923877073891446]
We present the first comprehensive survey on XAI in semantic image segmentation. This work focuses on techniques that were either specifically introduced for dense prediction tasks or were extended for them by modifying existing methods in classification.
arXiv Detail & Related papers (2024-05-02T18:00:25Z)
Two in One Go: Single-stage Emotion Recognition with Decoupled Subject-context Transformer [78.35816158511523]
We present a single-stage emotion recognition approach, employing a Decoupled Subject-Context Transformer (DSCT) for simultaneous subject localization and emotion classification. We evaluate our single-stage framework on two widely used context-aware emotion recognition datasets, CAER-S and EMOTIC.
arXiv Detail & Related papers (2024-04-26T07:30:32Z)
Guiding Computational Stance Detection with Expanded Stance Triangle Framework [25.2980607215715]
Stance detection determines whether the author of a piece of text is in favor of, against, or neutral towards a specified target. We decompose the stance detection task from a linguistic perspective, and investigate key components and inference paths in this task.
arXiv Detail & Related papers (2023-05-31T13:33:29Z)
Robust Saliency-Aware Distillation for Few-shot Fine-grained Visual Recognition [57.08108545219043]
Recognizing novel sub-categories with scarce samples is an essential and challenging research topic in computer vision. Existing literature addresses this challenge by employing local-based representation approaches. This article proposes a novel model, Robust Saliency-aware Distillation (RSaD), for few-shot fine-grained visual recognition.
arXiv Detail & Related papers (2023-05-12T00:13:17Z)
Term-community-based topic detection with variable resolution [0.0]
Network-based procedures for topic detection in huge text collections offer an intuitive alternative to probabilistic topic models. We present a method that is especially designed with the requirements of domain experts in mind. We demonstrate the application of our method with a widely used corpus of general news articles and show the results of detailed social-sciences expert evaluations.
arXiv Detail & Related papers (2021-03-25T01:29:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.