Explicit Visual Prompting for Low-Level Structure Segmentations
- URL: http://arxiv.org/abs/2303.10883v2
- Date: Tue, 21 Mar 2023 07:25:09 GMT
- Title: Explicit Visual Prompting for Low-Level Structure Segmentations
- Authors: Weihuang Liu, Xi Shen, Chi-Man Pun, Xiaodong Cun
- Abstract summary: We propose a new visual prompting model, named Explicit Visual Prompting (EVP)
EVP significantly outperforms other parameter-efficient tuning protocols under the same amount of tunable parameters.
EVP also achieves state-of-the-art performances on diverse low-level structure segmentation tasks.
- Score: 55.51869354956533
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the generic problem of detecting low-level structures in images,
which includes segmenting the manipulated parts, identifying out-of-focus
pixels, separating shadow regions, and detecting concealed objects. Whereas
each such topic has been typically addressed with a domain-specific solution,
we show that a unified approach performs well across all of them. We take
inspiration from the widely-used pre-training and then prompt tuning protocols
in NLP and propose a new visual prompting model, named Explicit Visual
Prompting (EVP). Different from the previous visual prompting which is
typically a dataset-level implicit embedding, our key insight is to enforce the
tunable parameters focusing on the explicit visual content from each individual
image, i.e., the features from frozen patch embeddings and the input's
high-frequency components. The proposed EVP significantly outperforms other
parameter-efficient tuning protocols under the same amount of tunable
parameters (5.7% extra trainable parameters of each task). EVP also achieves
state-of-the-art performances on diverse low-level structure segmentation tasks
compared to task-specific solutions. Our code is available at:
https://github.com/NiFangBaAGe/Explicit-Visual-Prompt.
Related papers
- LESS: Label-Efficient and Single-Stage Referring 3D Segmentation [55.06002976797879]
Referring 3D is a visual-language task that segments all points of the specified object from a 3D point cloud described by a sentence of query.
We propose a novel Referring 3D pipeline, Label-Efficient and Single-Stage, dubbed LESS, which is only under the supervision of efficient binary mask.
We achieve state-of-the-art performance on ScanRefer dataset by surpassing the previous methods about 3.7% mIoU using only binary labels.
arXiv Detail & Related papers (2024-10-17T07:47:41Z) - MROVSeg: Breaking the Resolution Curse of Vision-Language Models in Open-Vocabulary Semantic Segmentation [33.67313662538398]
We propose a multi-resolution training framework for open-vocabulary semantic segmentation with a single pretrained CLIP backbone.
MROVSeg uses sliding windows to slice the high-resolution input into uniform patches, each matching the input size of the well-trained image encoder.
We demonstrate the superiority of MROVSeg on well-established open-vocabulary semantic segmentation benchmarks.
arXiv Detail & Related papers (2024-08-27T04:45:53Z) - Optimal Transport Aggregation for Visual Place Recognition [9.192660643226372]
We introduce SALAD, which reformulates NetVLAD's soft-assignment of local features to clusters as an optimal transport problem.
In SALAD, we consider both feature-to-cluster and cluster-to-feature relations and we also introduce a 'dustbin' cluster, designed to selectively discard features deemed non-informative.
Our single-stage method surpasses single-stage baselines in public VPR datasets, but also surpasses two-stage methods that add a re-ranking with significantly higher cost.
arXiv Detail & Related papers (2023-11-27T15:46:19Z) - Visual In-Context Prompting [100.93587329049848]
In this paper, we introduce a universal visual in-context prompting framework for both vision tasks like open-set segmentation and detection.
We build on top of an encoder-decoder architecture, and develop a versatile prompt encoder to support a variety of prompts like strokes, boxes, and points.
Our extensive explorations show that the proposed visual in-context prompting elicits extraordinary referring and generic segmentation capabilities.
arXiv Detail & Related papers (2023-11-22T18:59:48Z) - A Dynamic Feature Interaction Framework for Multi-task Visual Perception [100.98434079696268]
We devise an efficient unified framework to solve multiple common perception tasks.
These tasks include instance segmentation, semantic segmentation, monocular 3D detection, and depth estimation.
Our proposed framework, termed D2BNet, demonstrates a unique approach to parameter-efficient predictions for multi-task perception.
arXiv Detail & Related papers (2023-06-08T09:24:46Z) - Explicit Visual Prompting for Universal Foreground Segmentations [55.51869354956533]
We present a unified framework for a number of foreground segmentation tasks without any task-specific designs.
We take inspiration from the widely-used pre-training and then prompt tuning protocols in NLP.
Our method freezes a pre-trained model and then learns task-specific knowledge using a few extra parameters.
arXiv Detail & Related papers (2023-05-29T11:05:01Z) - Exploring Sparse Visual Prompt for Domain Adaptive Dense Prediction [22.868597464136787]
We propose a novel Sparse Visual Domain Prompts (SVDP) approach, which holds minimal trainable parameters in the image-level prompt and reserves more spatial information of the input.
Our proposed method achieves state-of-the-art performance in both semantic segmentation and depth estimation tasks.
arXiv Detail & Related papers (2023-03-17T06:26:55Z) - SceneEncoder: Scene-Aware Semantic Segmentation of Point Clouds with A
Learnable Scene Descriptor [51.298760338410624]
We propose a SceneEncoder module to impose a scene-aware guidance to enhance the effect of global information.
The module predicts a scene descriptor, which learns to represent the categories of objects existing in the scene.
We also design a region similarity loss to propagate distinguishing features to their own neighboring points with the same label.
arXiv Detail & Related papers (2020-01-24T16:53:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.