FADE: Fusing the Assets of Decoder and Encoder for Task-Agnostic
Upsampling
- URL: http://arxiv.org/abs/2207.10392v1
- Date: Thu, 21 Jul 2022 10:06:01 GMT
- Title: FADE: Fusing the Assets of Decoder and Encoder for Task-Agnostic
Upsampling
- Authors: Hao Lu, Wenze Liu, Hongtao Fu, Zhiguo Cao
- Abstract summary: We present FADE, a novel, plug-and-play, and task-agnostic upsampling operator.
We first study the upsampling properties of FADE on toy data and then evaluate it on large-scale semantic segmentation and image matting.
- Score: 21.590872272491033
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the problem of task-agnostic feature upsampling in dense
prediction where an upsampling operator is required to facilitate both
region-sensitive tasks like semantic segmentation and detail-sensitive tasks
such as image matting. Existing upsampling operators often can work well in
either type of the tasks, but not both. In this work, we present FADE, a novel,
plug-and-play, and task-agnostic upsampling operator. FADE benefits from three
design choices: i) considering encoder and decoder features jointly in
upsampling kernel generation; ii) an efficient semi-shift convolutional
operator that enables granular control over how each feature point contributes
to upsampling kernels; iii) a decoder-dependent gating mechanism for enhanced
detail delineation. We first study the upsampling properties of FADE on toy
data and then evaluate it on large-scale semantic segmentation and image
matting. In particular, FADE reveals its effectiveness and task-agnostic
characteristic by consistently outperforming recent dynamic upsampling
operators in different tasks. It also generalizes well across convolutional and
transformer architectures with little computational overhead. Our work
additionally provides thoughtful insights on what makes for task-agnostic
upsampling. Code is available at: http://lnkiy.in/fade_in
Related papers
- FADE: A Task-Agnostic Upsampling Operator for Encoder-Decoder Architectures [18.17019371324024]
FADE is a novel, plug-and-play, lightweight, and task-agnostic upsampling operator.
We show that FADE is task-agnostic with consistent performance improvement on a number of dense prediction tasks.
For the first time, we demonstrate robust feature upsampling on both region- and detail-sensitive tasks successfully.
arXiv Detail & Related papers (2024-07-18T13:32:36Z) - EffiPerception: an Efficient Framework for Various Perception Tasks [6.1522068855729755]
EffiPerception is a framework to explore common learning patterns and increase the module.
It could achieve great accuracy robustness with relatively low memory cost under several perception tasks.
EffiPerception could show great accuracy-speed-memory overall performance increase within the four detection and segmentation tasks.
arXiv Detail & Related papers (2024-03-18T23:22:37Z) - Multi-task Learning with 3D-Aware Regularization [55.97507478913053]
We propose a structured 3D-aware regularizer which interfaces multiple tasks through the projection of features extracted from an image encoder to a shared 3D feature space.
We show that the proposed method is architecture agnostic and can be plugged into various prior multi-task backbones to improve their performance.
arXiv Detail & Related papers (2023-10-02T08:49:56Z) - ASAG: Building Strong One-Decoder-Layer Sparse Detectors via Adaptive
Sparse Anchor Generation [50.01244854344167]
We bridge the performance gap between sparse and dense detectors by proposing Adaptive Sparse Anchor Generator (ASAG)
ASAG predicts dynamic anchors on patches rather than grids in a sparse way so that it alleviates the feature conflict problem.
Our method outperforms dense-d ones and achieves a better speed-accuracy trade-off.
arXiv Detail & Related papers (2023-08-18T02:06:49Z) - A Dynamic Feature Interaction Framework for Multi-task Visual Perception [100.98434079696268]
We devise an efficient unified framework to solve multiple common perception tasks.
These tasks include instance segmentation, semantic segmentation, monocular 3D detection, and depth estimation.
Our proposed framework, termed D2BNet, demonstrates a unique approach to parameter-efficient predictions for multi-task perception.
arXiv Detail & Related papers (2023-06-08T09:24:46Z) - Object Segmentation by Mining Cross-Modal Semantics [68.88086621181628]
We propose a novel approach by mining the Cross-Modal Semantics to guide the fusion and decoding of multimodal features.
Specifically, we propose a novel network, termed XMSNet, consisting of (1) all-round attentive fusion (AF), (2) coarse-to-fine decoder (CFD), and (3) cross-layer self-supervision.
arXiv Detail & Related papers (2023-05-17T14:30:11Z) - Feature Completion Transformer for Occluded Person Re-identification [25.159974510754992]
Occluded person re-identification (Re-ID) is a challenging problem due to the destruction of occluders.
We propose a Feature Completion Transformer (FCFormer) to implicitly complement the semantic information of occluded parts in the feature space.
FCFormer achieves superior performance and outperforms the state-of-the-art methods by significant margins on occluded datasets.
arXiv Detail & Related papers (2023-03-03T01:12:57Z) - MASTER: Multi-task Pre-trained Bottlenecked Masked Autoencoders are
Better Dense Retrievers [140.0479479231558]
In this work, we aim to unify a variety of pre-training tasks into a multi-task pre-trained model, namely MASTER.
MASTER utilizes a shared-encoder multi-decoder architecture that can construct a representation bottleneck to compress the abundant semantic information across tasks into dense vectors.
arXiv Detail & Related papers (2022-12-15T13:57:07Z) - Hyperdecoders: Instance-specific decoders for multi-task NLP [9.244884318445413]
We investigate input-conditioned hypernetworks for multi-tasking in NLP.
We generate parameter-efficient adaptations for a decoder using a hypernetwork conditioned on the output of an encoder.
arXiv Detail & Related papers (2022-03-15T22:39:53Z) - Learning Affinity-Aware Upsampling for Deep Image Matting [83.02806488958399]
We show that learning affinity in upsampling provides an effective and efficient approach to exploit pairwise interactions in deep networks.
In particular, results on the Composition-1k matting dataset show that A2U achieves a 14% relative improvement in the SAD metric against a strong baseline.
Compared with the state-of-the-art matting network, we achieve 8% higher performance with only 40% model complexity.
arXiv Detail & Related papers (2020-11-29T05:09:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.