SegLLM: Multi-round Reasoning Segmentation
- URL: http://arxiv.org/abs/2410.18923v2
- Date: Thu, 31 Oct 2024 19:44:05 GMT
- Title: SegLLM: Multi-round Reasoning Segmentation
- Authors: XuDong Wang, Shaolun Zhang, Shufan Li, Konstantinos Kallidromitis, Kehan Li, Yusuke Kato, Kazuki Kozuka, Trevor Darrell,
- Abstract summary: We present SegLLM, a novel multi-round interactive reasoning segmentation model.
SegLLM re-integrates previous segmentation results into its input stream.
It responds to visual and text queries in a chat-like manner.
- Score: 41.952545634785565
- License:
- Abstract: We present SegLLM, a novel multi-round interactive reasoning segmentation model that enhances LLM-based segmentation by exploiting conversational memory of both visual and textual outputs. By leveraging a mask-aware multimodal LLM, SegLLM re-integrates previous segmentation results into its input stream, enabling it to reason about complex user intentions and segment objects in relation to previously identified entities, including positional, interactional, and hierarchical relationships, across multiple interactions. This capability allows SegLLM to respond to visual and text queries in a chat-like manner. Evaluated on the newly curated MRSeg benchmark, SegLLM outperforms existing methods in multi-round interactive reasoning segmentation by over 20%. Additionally, we observed that training on multi-round reasoning segmentation data enhances performance on standard single-round referring segmentation and localization tasks, resulting in a 5.5% increase in cIoU for referring expression segmentation and a 4.5% improvement in Acc@0.5 for referring expression localization.
Related papers
- LISA++: An Improved Baseline for Reasoning Segmentation with Large
Language Model [54.850048630298495]
We introduce LISA++, an update to the existing LISA model, focusing on improving core functionalities while keeping the base architecture intact.
The instance segmentation ability has been added, providing a more detailed scene analysis along with the existing multi-region semantic segmentation.
These improvements are achieved by curating existing samples of generic segmentation datasets aimed specifically at enhancing the segmentation and conversational skills without structural change and additional data sources.
arXiv Detail & Related papers (2023-12-28T18:58:33Z) - Segment Everything Everywhere All at Once [124.90835636901096]
We present SEEM, a promptable and interactive model for segmenting everything everywhere all at once in an image.
We propose a novel decoding mechanism that enables diverse prompting for all types of segmentation tasks.
We conduct a comprehensive empirical study to validate the effectiveness of SEEM across diverse segmentation tasks.
arXiv Detail & Related papers (2023-04-13T17:59:40Z) - Semantics-Aware Dynamic Localization and Refinement for Referring Image
Segmentation [102.25240608024063]
Referring image segments an image from a language expression.
We develop an algorithm that shifts from being localization-centric to segmentation-language.
Compared to its counterparts, our method is more versatile yet effective.
arXiv Detail & Related papers (2023-03-11T08:42:40Z) - Framework-agnostic Semantically-aware Global Reasoning for Segmentation [29.69187816377079]
We propose a component that learns to project image features into latent representations and reason between them.
Our design encourages the latent regions to represent semantic concepts by ensuring that the activated regions are spatially disjoint.
Our latent tokens are semantically interpretable and diverse and provide a rich set of features that can be transferred to downstream tasks.
arXiv Detail & Related papers (2022-12-06T21:42:05Z) - Open-world Semantic Segmentation via Contrasting and Clustering
Vision-Language Embedding [95.78002228538841]
We propose a new open-world semantic segmentation pipeline that makes the first attempt to learn to segment semantic objects of various open-world categories without any efforts on dense annotations.
Our method can directly segment objects of arbitrary categories, outperforming zero-shot segmentation methods that require data labeling on three benchmark datasets.
arXiv Detail & Related papers (2022-07-18T09:20:04Z) - UCP-Net: Unstructured Contour Points for Instance Segmentation [2.105564340986074]
We propose a novel approach to interactive segmentation based on unconstrained contour clicks for initial segmentation and segmentation refinement.
Our method is class-agnostic and produces accurate segmentation masks (IoU > 85%) for a lower number of user interactions than state-of-the-art methods on popular segmentation datasets.
arXiv Detail & Related papers (2021-09-15T22:03:37Z) - Referring Image Segmentation via Cross-Modal Progressive Comprehension [94.70482302324704]
Referring image segmentation aims at segmenting the foreground masks of the entities that can well match the description given in the natural language expression.
Previous approaches tackle this problem using implicit feature interaction and fusion between visual and linguistic modalities.
We propose a Cross-Modal Progressive (CMPC) module and a Text-Guided Feature Exchange (TGFE) module to effectively address the challenging task.
arXiv Detail & Related papers (2020-10-01T16:02:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.