Related papers: Explicit Visual Prompting for Universal Foreground Segmentations

Explicit Visual Prompting for Universal Foreground Segmentations

URL: http://arxiv.org/abs/2305.18476v1
Date: Mon, 29 May 2023 11:05:01 GMT
Title: Explicit Visual Prompting for Universal Foreground Segmentations
Authors: Weihuang Liu, Xi Shen, Chi-Man Pun, Xiaodong Cun
Abstract summary: We present a unified framework for a number of foreground segmentation tasks without any task-specific designs. We take inspiration from the widely-used pre-training and then prompt tuning protocols in NLP. Our method freezes a pre-trained model and then learns task-specific knowledge using a few extra parameters.
Score: 55.51869354956533
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Foreground segmentation is a fundamental problem in computer vision, which includes salient object detection, forgery detection, defocus blur detection, shadow detection, and camouflage object detection. Previous works have typically relied on domain-specific solutions to address accuracy and robustness issues in those applications. In this paper, we present a unified framework for a number of foreground segmentation tasks without any task-specific designs. We take inspiration from the widely-used pre-training and then prompt tuning protocols in NLP and propose a new visual prompting model, named Explicit Visual Prompting (EVP). Different from the previous visual prompting which is typically a dataset-level implicit embedding, our key insight is to enforce the tunable parameters focusing on the explicit visual content from each individual image, i.e., the features from frozen patch embeddings and high-frequency components. Our method freezes a pre-trained model and then learns task-specific knowledge using a few extra parameters. Despite introducing only a small number of tunable parameters, EVP achieves superior performance than full fine-tuning and other parameter-efficient fine-tuning methods. Experiments in fourteen datasets across five tasks show the proposed method outperforms other task-specific methods while being considerably simple. The proposed method demonstrates the scalability in different architectures, pre-trained weights, and tasks. The code is available at: https://github.com/NiFangBaAGe/Explicit-Visual-Prompt.

Related papers

LoR-VP: Low-Rank Visual Prompting for Efficient Vision Model Adaptation [41.77434289193232]
We propose a novel visual prompt design, introducing Low-Rank matrix multiplication for Visual Prompting (LoR-VP) LoR-VP enables shared and patch-specific information across rows and columns of image pixels. Experiments demonstrate significant improvements in both performance and efficiency compared to state-of-the-art visual prompting methods.
arXiv Detail & Related papers (2025-02-02T20:10:48Z)
Just a Few Glances: Open-Set Visual Perception with Image Prompt Paradigm [22.407887601771026]
Open-Set Object Detection (OSOD) and Open-Set Object (OSS) have attracted a surge of interest from researchers. Mainstream OSOD and OSS methods generally utilize text as a prompt, achieving remarkable performance. We propose a novel prompt paradigm in OSOD and OSS, that is, textbfImage Prompt Paradigm. In this framework, high-quality image prompts are automatically encoded, selected and fused, achieving the single-stage and non-interactive inference.
arXiv Detail & Related papers (2024-12-14T07:23:14Z)
Learning A Low-Level Vision Generalist via Visual Task Prompt [43.54563263106761]
We propose a Visual task Prompt-based Image Processing (VPIP) framework to overcome these challenges. VPIP employs visual task prompts to manage tasks with different input-target domains and allows flexible selection of backbone network. Based on the VPIP framework, we train a low-level vision generalist model, namely GenLV, on 30 diverse tasks.
arXiv Detail & Related papers (2024-08-16T08:37:56Z)
Task-Adapter: Task-specific Adaptation of Image Models for Few-shot Action Recognition [34.88916568947695]
We propose a simple but effective task-specific adaptation method (Task-Adapter) for few-shot action recognition. By introducing the proposed Task-Adapter into the last several layers of the backbone, we mitigate the overfitting problem caused by full fine-tuning. Experimental results consistently demonstrate the effectiveness of our proposed Task-Adapter on four standard few-shot action recognition datasets.
arXiv Detail & Related papers (2024-08-01T03:06:56Z)
Aligning and Prompting Everything All at Once for Universal Visual Perception [79.96124061108728]
APE is a universal visual perception model for aligning and prompting everything all at once in an image to perform diverse tasks. APE advances the convergence of detection and grounding by reformulating language-guided grounding as open-vocabulary detection. Experiments on over 160 datasets demonstrate that APE outperforms state-of-the-art models.
arXiv Detail & Related papers (2023-12-04T18:59:50Z)
ComPtr: Towards Diverse Bi-source Dense Prediction Tasks via A Simple yet General Complementary Transformer [91.43066633305662]
We propose a novel underlineComPlementary underlinetransformer, textbfComPtr, for diverse bi-source dense prediction tasks. ComPtr treats different inputs equally and builds an efficient dense interaction model in the form of sequence-to-sequence on top of the transformer.
arXiv Detail & Related papers (2023-07-23T15:17:45Z)
A Dynamic Feature Interaction Framework for Multi-task Visual Perception [100.98434079696268]
We devise an efficient unified framework to solve multiple common perception tasks. These tasks include instance segmentation, semantic segmentation, monocular 3D detection, and depth estimation. Our proposed framework, termed D2BNet, demonstrates a unique approach to parameter-efficient predictions for multi-task perception.
arXiv Detail & Related papers (2023-06-08T09:24:46Z)
Explicit Visual Prompting for Low-Level Structure Segmentations [55.51869354956533]
We propose a new visual prompting model, named Explicit Visual Prompting (EVP) EVP significantly outperforms other parameter-efficient tuning protocols under the same amount of tunable parameters. EVP also achieves state-of-the-art performances on diverse low-level structure segmentation tasks.
arXiv Detail & Related papers (2023-03-20T06:01:53Z)
Disambiguation of One-Shot Visual Classification Tasks: A Simplex-Based Approach [8.436437583394998]
We present a strategy which aims at detecting the presence of multiple objects in a given shot. This strategy is based on identifying the corners of a simplex in a high dimensional space. We show the ability of the proposed method to slightly, yet statistically significantly, improve accuracy in extreme settings.
arXiv Detail & Related papers (2023-01-16T11:37:05Z)
Unsupervised Semantic Segmentation by Contrasting Object Mask Proposals [78.12377360145078]
We introduce a novel two-step framework that adopts a predetermined prior in a contrastive optimization objective to learn pixel embeddings. This marks a large deviation from existing works that relied on proxy tasks or end-to-end clustering. In particular, when fine-tuning the learned representations using just 1% of labeled examples on PASCAL, we outperform supervised ImageNet pre-training by 7.1% mIoU.
arXiv Detail & Related papers (2021-02-11T18:54:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.