Explicit Visual Prompting for Low-Level Structure Segmentations
- URL: http://arxiv.org/abs/2303.10883v2
- Date: Tue, 21 Mar 2023 07:25:09 GMT
- Title: Explicit Visual Prompting for Low-Level Structure Segmentations
- Authors: Weihuang Liu, Xi Shen, Chi-Man Pun, Xiaodong Cun
- Abstract summary: We propose a new visual prompting model, named Explicit Visual Prompting (EVP)
EVP significantly outperforms other parameter-efficient tuning protocols under the same amount of tunable parameters.
EVP also achieves state-of-the-art performances on diverse low-level structure segmentation tasks.
- Score: 55.51869354956533
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the generic problem of detecting low-level structures in images,
which includes segmenting the manipulated parts, identifying out-of-focus
pixels, separating shadow regions, and detecting concealed objects. Whereas
each such topic has been typically addressed with a domain-specific solution,
we show that a unified approach performs well across all of them. We take
inspiration from the widely-used pre-training and then prompt tuning protocols
in NLP and propose a new visual prompting model, named Explicit Visual
Prompting (EVP). Different from the previous visual prompting which is
typically a dataset-level implicit embedding, our key insight is to enforce the
tunable parameters focusing on the explicit visual content from each individual
image, i.e., the features from frozen patch embeddings and the input's
high-frequency components. The proposed EVP significantly outperforms other
parameter-efficient tuning protocols under the same amount of tunable
parameters (5.7% extra trainable parameters of each task). EVP also achieves
state-of-the-art performances on diverse low-level structure segmentation tasks
compared to task-specific solutions. Our code is available at:
https://github.com/NiFangBaAGe/Explicit-Visual-Prompt.
Related papers
- LoR-VP: Low-Rank Visual Prompting for Efficient Vision Model Adaptation [41.77434289193232]
We propose a novel visual prompt design, introducing Low-Rank matrix multiplication for Visual Prompting (LoR-VP)
LoR-VP enables shared and patch-specific information across rows and columns of image pixels.
Experiments demonstrate significant improvements in both performance and efficiency compared to state-of-the-art visual prompting methods.
arXiv Detail & Related papers (2025-02-02T20:10:48Z) - Test-Time Optimization for Domain Adaptive Open Vocabulary Segmentation [15.941958367737408]
We present Seg-TTO, a framework for zero-shot, open-vocabulary semantic segmentation (OVSS)
We focus on segmentation-specific test-time optimization to address this gap.
We integrate Seg-TTO with three state-of-the-art OVSS approaches and evaluate across 22 challenging OVSS tasks covering a range of specialized domains.
arXiv Detail & Related papers (2025-01-08T18:58:24Z) - LESS: Label-Efficient and Single-Stage Referring 3D Segmentation [55.06002976797879]
Referring 3D is a visual-language task that segments all points of the specified object from a 3D point cloud described by a sentence of query.
We propose a novel Referring 3D pipeline, Label-Efficient and Single-Stage, dubbed LESS, which is only under the supervision of efficient binary mask.
We achieve state-of-the-art performance on ScanRefer dataset by surpassing the previous methods about 3.7% mIoU using only binary labels.
arXiv Detail & Related papers (2024-10-17T07:47:41Z) - Optimal Transport Aggregation for Visual Place Recognition [9.192660643226372]
We introduce SALAD, which reformulates NetVLAD's soft-assignment of local features to clusters as an optimal transport problem.
In SALAD, we consider both feature-to-cluster and cluster-to-feature relations and we also introduce a 'dustbin' cluster, designed to selectively discard features deemed non-informative.
Our single-stage method surpasses single-stage baselines in public VPR datasets, but also surpasses two-stage methods that add a re-ranking with significantly higher cost.
arXiv Detail & Related papers (2023-11-27T15:46:19Z) - Visual In-Context Prompting [100.93587329049848]
In this paper, we introduce a universal visual in-context prompting framework for both vision tasks like open-set segmentation and detection.
We build on top of an encoder-decoder architecture, and develop a versatile prompt encoder to support a variety of prompts like strokes, boxes, and points.
Our extensive explorations show that the proposed visual in-context prompting elicits extraordinary referring and generic segmentation capabilities.
arXiv Detail & Related papers (2023-11-22T18:59:48Z) - A Dynamic Feature Interaction Framework for Multi-task Visual Perception [100.98434079696268]
We devise an efficient unified framework to solve multiple common perception tasks.
These tasks include instance segmentation, semantic segmentation, monocular 3D detection, and depth estimation.
Our proposed framework, termed D2BNet, demonstrates a unique approach to parameter-efficient predictions for multi-task perception.
arXiv Detail & Related papers (2023-06-08T09:24:46Z) - Explicit Visual Prompting for Universal Foreground Segmentations [55.51869354956533]
We present a unified framework for a number of foreground segmentation tasks without any task-specific designs.
We take inspiration from the widely-used pre-training and then prompt tuning protocols in NLP.
Our method freezes a pre-trained model and then learns task-specific knowledge using a few extra parameters.
arXiv Detail & Related papers (2023-05-29T11:05:01Z) - Exploring Sparse Visual Prompt for Domain Adaptive Dense Prediction [22.868597464136787]
We propose a novel Sparse Visual Domain Prompts (SVDP) approach, which holds minimal trainable parameters in the image-level prompt and reserves more spatial information of the input.
Our proposed method achieves state-of-the-art performance in both semantic segmentation and depth estimation tasks.
arXiv Detail & Related papers (2023-03-17T06:26:55Z) - SceneEncoder: Scene-Aware Semantic Segmentation of Point Clouds with A
Learnable Scene Descriptor [51.298760338410624]
We propose a SceneEncoder module to impose a scene-aware guidance to enhance the effect of global information.
The module predicts a scene descriptor, which learns to represent the categories of objects existing in the scene.
We also design a region similarity loss to propagate distinguishing features to their own neighboring points with the same label.
arXiv Detail & Related papers (2020-01-24T16:53:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.