LISA++: An Improved Baseline for Reasoning Segmentation with Large
Language Model
- URL: http://arxiv.org/abs/2312.17240v3
- Date: Mon, 22 Jan 2024 06:53:23 GMT
- Title: LISA++: An Improved Baseline for Reasoning Segmentation with Large
Language Model
- Authors: Senqiao Yang and Tianyuan Qu and Xin Lai and Zhuotao Tian and Bohao
Peng and Shu Liu and Jiaya Jia
- Abstract summary: We introduce LISA++, an update to the existing LISA model, focusing on improving core functionalities while keeping the base architecture intact.
The instance segmentation ability has been added, providing a more detailed scene analysis along with the existing multi-region semantic segmentation.
These improvements are achieved by curating existing samples of generic segmentation datasets aimed specifically at enhancing the segmentation and conversational skills without structural change and additional data sources.
- Score: 54.850048630298495
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While LISA effectively bridges the gap between segmentation and large
language models to enable reasoning segmentation, it poses certain limitations:
unable to distinguish different instances of the target region, and constrained
by the pre-defined textual response formats. In this work, we introduce LISA++,
an update to the existing LISA model, focusing on improving core
functionalities while keeping the base architecture intact. The main
enhancements in LISA++ include: \textbf{1) Enhanced Segmentation}: The instance
segmentation ability has been added, providing a more detailed scene analysis
along with the existing multi-region semantic segmentation. \textbf{2) More
Natural Conversation}: Improved capability for multi-turn dialogue, with the
ability to incorporate segmentation results directly into text responses, i.e.,
Segmentation in Dialogue (SiD). These improvements are achieved by curating the
existing samples of generic segmentation datasets, aimed specifically at
enhancing the segmentation and conversational skills without structural change
and additional data sources. Comparative analysis with the original LISA model
shows significant advancements in these areas, positioning LISA++ as a notable
upgrade in visual understanding and interaction. LISA++'s adaptability and
improved features highlight the versatility of the mask-as-embedding paradigm
proposed by LISA, and the potential as a foundational model for diverse
applications.
Related papers
- Visual Prompt Selection for In-Context Learning Segmentation [77.15684360470152]
In this paper, we focus on rethinking and improving the example selection strategy.
We first demonstrate that ICL-based segmentation models are sensitive to different contexts.
Furthermore, empirical evidence indicates that the diversity of contextual prompts plays a crucial role in guiding segmentation.
arXiv Detail & Related papers (2024-07-14T15:02:54Z) - Unified Language-driven Zero-shot Domain Adaptation [55.64088594551629]
Unified Language-driven Zero-shot Domain Adaptation (ULDA) is a novel task setting.
It enables a single model to adapt to diverse target domains without explicit domain-ID knowledge.
arXiv Detail & Related papers (2024-04-10T16:44:11Z) - Cross-domain Multi-modal Few-shot Object Detection via Rich Text [21.36633828492347]
Cross-modal feature extraction and integration have led to steady performance improvements in few-shot learning tasks.
We study the Cross-Domain few-shot generalization of MM-OD (CDMM-FSOD) and propose a meta-learning based multi-modal few-shot object detection method.
arXiv Detail & Related papers (2024-03-24T15:10:22Z) - From Text Segmentation to Smart Chaptering: A Novel Benchmark for
Structuring Video Transcriptions [63.11097464396147]
We introduce a novel benchmark YTSeg focusing on spoken content that is inherently more unstructured and both topically and structurally diverse.
We also introduce an efficient hierarchical segmentation model MiniSeg, that outperforms state-of-the-art baselines.
arXiv Detail & Related papers (2024-02-27T15:59:37Z) - LISA: Reasoning Segmentation via Large Language Model [68.24075852136761]
We propose a new segmentation task -- reasoning segmentation.
The task is designed to output a segmentation mask given a complex and implicit query text.
We present LISA: large Language Instructed Assistant, which inherits the language generation capabilities of multimodal Large Language Models.
arXiv Detail & Related papers (2023-08-01T17:50:17Z) - Incorporating Linguistic Knowledge for Abstractive Multi-document
Summarization [20.572283625521784]
We develop a neural network based abstractive multi-document summarization (MDS) model.
We process the dependency information into the linguistic-guided attention mechanism.
With the help of linguistic signals, sentence-level relations can be correctly captured.
arXiv Detail & Related papers (2021-09-23T08:13:35Z) - Referring Image Segmentation via Cross-Modal Progressive Comprehension [94.70482302324704]
Referring image segmentation aims at segmenting the foreground masks of the entities that can well match the description given in the natural language expression.
Previous approaches tackle this problem using implicit feature interaction and fusion between visual and linguistic modalities.
We propose a Cross-Modal Progressive (CMPC) module and a Text-Guided Feature Exchange (TGFE) module to effectively address the challenging task.
arXiv Detail & Related papers (2020-10-01T16:02:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.