CARAFE++: Unified Content-Aware ReAssembly of FEatures
- URL: http://arxiv.org/abs/2012.04733v1
- Date: Mon, 7 Dec 2020 07:34:57 GMT
- Title: CARAFE++: Unified Content-Aware ReAssembly of FEatures
- Authors: Jiaqi Wang, Kai Chen, Rui Xu, Ziwei Liu, Chen Change Loy, Dahua Lin
- Abstract summary: We propose unified Content-Aware ReAssembly of FEatures (CARAFE++), a universal, lightweight and highly effective operator to fulfill this goal.
CARAFE++ generates adaptive kernels on-the-fly to enable instance-specific content-aware handling.
It shows consistent and substantial gains across all the tasks with negligible computational overhead.
- Score: 132.49582482421246
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Feature reassembly, i.e. feature downsampling and upsampling, is a key
operation in a number of modern convolutional network architectures, e.g.,
residual networks and feature pyramids. Its design is critical for dense
prediction tasks such as object detection and semantic/instance segmentation.
In this work, we propose unified Content-Aware ReAssembly of FEatures
(CARAFE++), a universal, lightweight and highly effective operator to fulfill
this goal. CARAFE++ has several appealing properties: (1) Unlike conventional
methods such as pooling and interpolation that only exploit sub-pixel
neighborhood, CARAFE++ aggregates contextual information within a large
receptive field. (2) Instead of using a fixed kernel for all samples (e.g.
convolution and deconvolution), CARAFE++ generates adaptive kernels on-the-fly
to enable instance-specific content-aware handling. (3) CARAFE++ introduces
little computational overhead and can be readily integrated into modern network
architectures. We conduct comprehensive evaluations on standard benchmarks in
object detection, instance/semantic segmentation and image inpainting. CARAFE++
shows consistent and substantial gains across all the tasks (2.5% APbox, 2.1%
APmask, 1.94% mIoU, 1.35 dB respectively) with negligible computational
overhead. It shows great potential to serve as a strong building block for
modern deep networks.
Related papers
- MacFormer: Semantic Segmentation with Fine Object Boundaries [38.430631361558426]
We introduce a new semantic segmentation architecture, MacFormer'', which features two key components.
Firstly, using learnable agent tokens, a Mutual Agent Cross-Attention (MACA) mechanism effectively facilitates the bidirectional integration of features across encoder and decoder layers.
Secondly, a Frequency Enhancement Module (FEM) in the decoder leverages high-frequency and low-frequency components to boost features in the frequency domain.
MacFormer is demonstrated to be compatible with various network architectures and outperforms existing methods in both accuracy and efficiency on datasets benchmark ADE20K and Cityscapes.
arXiv Detail & Related papers (2024-08-11T05:36:10Z) - General-Purpose Multimodal Transformer meets Remote Sensing Semantic
Segmentation [35.100738362291416]
Multimodal AI seeks to exploit complementary data sources, particularly for complex tasks like semantic segmentation.
Recent trends in general-purpose multimodal networks have shown great potential to achieve state-of-the-art performance.
We propose a UNet-inspired module that employs 3D convolution to encode vital local information and learn cross-modal features simultaneously.
arXiv Detail & Related papers (2023-07-07T04:58:34Z) - Object Segmentation by Mining Cross-Modal Semantics [68.88086621181628]
We propose a novel approach by mining the Cross-Modal Semantics to guide the fusion and decoding of multimodal features.
Specifically, we propose a novel network, termed XMSNet, consisting of (1) all-round attentive fusion (AF), (2) coarse-to-fine decoder (CFD), and (3) cross-layer self-supervision.
arXiv Detail & Related papers (2023-05-17T14:30:11Z) - A Unified Transformer Framework for Group-based Segmentation:
Co-Segmentation, Co-Saliency Detection and Video Salient Object Detection [59.21990697929617]
Humans tend to mine objects by learning from a group of images or several frames of video since we live in a dynamic world.
Previous approaches design different networks on similar tasks separately, and they are difficult to apply to each other.
We introduce a unified framework to tackle these issues, term as UFO (UnifiedObject Framework for Co-Object Framework)
arXiv Detail & Related papers (2022-03-09T13:35:19Z) - AF$_2$: Adaptive Focus Framework for Aerial Imagery Segmentation [86.44683367028914]
Aerial imagery segmentation has some unique challenges, the most critical one among which lies in foreground-background imbalance.
We propose Adaptive Focus Framework (AF$), which adopts a hierarchical segmentation procedure and focuses on adaptively utilizing multi-scale representations.
AF$ has significantly improved the accuracy on three widely used aerial benchmarks, as fast as the mainstream method.
arXiv Detail & Related papers (2022-02-18T10:14:45Z) - Boundary-Aware Segmentation Network for Mobile and Web Applications [60.815545591314915]
Boundary-Aware Network (BASNet) is integrated with a predict-refine architecture and a hybrid loss for highly accurate image segmentation.
BASNet runs at over 70 fps on a single GPU which benefits many potential real applications.
Based on BASNet, we further developed two (close to) commercial applications: AR COPY & PASTE, in which BASNet is augmented reality for "COPY" and "PASTING" real-world objects, and OBJECT CUT, which is a web-based tool for automatic object background removal.
arXiv Detail & Related papers (2021-01-12T19:20:26Z) - A Holistically-Guided Decoder for Deep Representation Learning with
Applications to Semantic Segmentation and Object Detection [74.88284082187462]
One common strategy is to adopt dilated convolutions in the backbone networks to extract high-resolution feature maps.
We propose one novel holistically-guided decoder which is introduced to obtain the high-resolution semantic-rich feature maps.
arXiv Detail & Related papers (2020-12-18T10:51:49Z) - Multi Receptive Field Network for Semantic Segmentation [8.06045579589765]
We propose a new Multi-Receptive Field Module (MRFM) for semantic segmentation.
We also design an edge-aware loss which is effective in distinguishing the boundaries of object/stuff.
Specifically, we achieve a mean IoU of 83.0 on the Cityscapes dataset and 88.4 mean IoU on the Pascal VOC2012 dataset.
arXiv Detail & Related papers (2020-11-17T11:52:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.