Related papers: Revisiting Context Aggregation for Image Matting

Revisiting Context Aggregation for Image Matting

URL: http://arxiv.org/abs/2304.01171v2
Date: Wed, 15 May 2024 02:24:58 GMT
Title: Revisiting Context Aggregation for Image Matting
Authors: Qinglin Liu, Xiaoqian Lv, Quanling Meng, Zonglin Li, Xiangyuan Lan, Shuo Yang, Shengping Zhang, Liqiang Nie,
Abstract summary: We present AEMatter, a matting network that is straightforward yet very effective. AEMatter adopts a Hybrid-Transformer backbone with appearance-enhanced axis-wise learning (AEAL) blocks to build a basic network with strong context aggregation learning capability. Extensive experiments on five popular matting datasets demonstrate that the proposed AEMatter outperforms state-of-the-art matting methods by a large margin.
Score: 57.90127743270313
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Traditional studies emphasize the significance of context information in improving matting performance. Consequently, deep learning-based matting methods delve into designing pooling or affinity-based context aggregation modules to achieve superior results. However, these modules cannot well handle the context scale shift caused by the difference in image size during training and inference, resulting in matting performance degradation. In this paper, we revisit the context aggregation mechanisms of matting networks and find that a basic encoder-decoder network without any context aggregation modules can actually learn more universal context aggregation, thereby achieving higher matting performance compared to existing methods. Building on this insight, we present AEMatter, a matting network that is straightforward yet very effective. AEMatter adopts a Hybrid-Transformer backbone with appearance-enhanced axis-wise learning (AEAL) blocks to build a basic network with strong context aggregation learning capability. Furthermore, AEMatter leverages a large image training strategy to assist the network in learning context aggregation from data. Extensive experiments on five popular matting datasets demonstrate that the proposed AEMatter outperforms state-of-the-art matting methods by a large margin.

Related papers

Fusion-CAM: Integrating Gradient and Region-Based Class Activation Maps for Robust Visual Explanations [2.537406035246369]
Class Activation Map (CAM) methods are widely adopted to visualize the input regions influencing model predictions.<n> Gradient-based approaches (e.g. Grad-CAM) provide highly discriminative, fine-grained details by computing of class activations.<n>Region-based approaches (e.g. Score-CAM) aggregate information over larger areas, capturing broader object coverage at the cost of over-smoothing.<n>We introduce Fusion-CAM, a novel framework that bridges this explanatory gap by unifying both paradigms through a dedicated fusion mechanism.
arXiv Detail & Related papers (2026-03-05T17:16:29Z)
Reconstructing Content via Collaborative Attention to Improve Multimodal Embedding Quality [59.651410243721045]
CoCoA is a Content reconstruction pre-training paradigm based on Collaborative Attention for multimodal embedding optimization.<n>We introduce an EOS-based reconstruction task, encouraging the model to reconstruct input from the corresponding EOS> embeddings.<n>Experiments on MMEB-V1 demonstrate that CoCoA built upon Qwen2-VL and Qwen2.5-VL significantly improves embedding quality.
arXiv Detail & Related papers (2026-03-02T05:34:45Z)
Towards Unified Semantic and Controllable Image Fusion: A Diffusion Transformer Approach [99.80480649258557]
DiTFuse is an instruction-driven framework that performs semantics-aware fusion within a single model.<n>Experiments on public IVIF, MFF, and MEF benchmarks confirm superior quantitative and qualitative performance, sharper textures, and better semantic retention.
arXiv Detail & Related papers (2025-12-08T05:04:54Z)
Relational Contrastive Learning and Masked Image Modeling for Scene Text Recognition [36.59116507158687]
We introduce a unified framework of Contrastive Learning and Masked Image Modeling for STR (RCMSTR) The proposed RCMSTR demonstrates superior performance in various STR-related downstream tasks, outperforming the existing state-of-the-art self-supervised STR techniques.
arXiv Detail & Related papers (2024-11-18T01:11:47Z)
Deep ContourFlow: Advancing Active Contours with Deep Learning [3.9948520633731026]
We present a framework for both unsupervised and one-shot approaches for image segmentation. It is capable of capturing complex object boundaries without the need for extensive labeled training data. This is particularly required in histology, a field facing a significant shortage of annotations.
arXiv Detail & Related papers (2024-07-15T13:12:34Z)
A Deep Unrolling Model with Hybrid Optimization Structure for Hyperspectral Image Deconvolution [50.13564338607482]
We propose a novel optimization framework for the hyperspectral deconvolution problem, called DeepMix.<n>It consists of three distinct modules, namely, a data consistency module, a module that enforces the effect of the handcrafted regularizers, and a denoising module.<n>This work proposes a context aware denoising module designed to sustain the advancements achieved by the cooperative efforts of the other modules.
arXiv Detail & Related papers (2023-06-10T08:25:16Z)
DAFormer: Improving Network Architectures and Training Strategies for Domain-Adaptive Semantic Segmentation [99.88539409432916]
We study the unsupervised domain adaptation (UDA) process. We propose a novel UDA method, DAFormer, based on the benchmark results. DAFormer significantly improves the state-of-the-art performance by 10.8 mIoU for GTA->Cityscapes and 5.4 mIoU for Synthia->Cityscapes.
arXiv Detail & Related papers (2021-11-29T19:00:46Z)
Prior-Induced Information Alignment for Image Matting [28.90998570043986]
We propose a novel network named Prior-Induced Information Alignment Matting Network (PIIAMatting) It can efficiently model the distinction of pixel-wise response maps and the correlation of layer-wise feature maps. PIIAMatting performs favourably against state-of-the-art image matting methods on the Alphamatting.com, Composition-1K and Distinctions-646 dataset.
arXiv Detail & Related papers (2021-06-28T07:46:59Z)
Deep Video Matting via Spatio-Temporal Alignment and Aggregation [63.6870051909004]
We propose a deep learning-based video matting framework which employs a novel aggregation feature module (STFAM) To eliminate frame-by-frame trimap annotations, a lightweight interactive trimap propagation network is also introduced. Our framework significantly outperforms conventional video matting and deep image matting methods.
arXiv Detail & Related papers (2021-04-22T17:42:08Z)
Context Decoupling Augmentation for Weakly Supervised Semantic Segmentation [53.49821324597837]
Weakly supervised semantic segmentation is a challenging problem that has been deeply studied in recent years. We present a Context Decoupling Augmentation ( CDA) method to change the inherent context in which the objects appear. To validate the effectiveness of the proposed method, extensive experiments on PASCAL VOC 2012 dataset with several alternative network architectures demonstrate that CDA can boost various popular WSSS methods to the new state-of-the-art by a large margin.
arXiv Detail & Related papers (2021-03-02T15:05:09Z)
Mixup-Transformer: Dynamic Data Augmentation for NLP Tasks [75.69896269357005]
Mixup is the latest data augmentation technique that linearly interpolates input examples and the corresponding labels. In this paper, we explore how to apply mixup to natural language processing tasks. We incorporate mixup to transformer-based pre-trained architecture, named "mixup-transformer", for a wide range of NLP tasks.
arXiv Detail & Related papers (2020-10-05T23:37:30Z)
FeatMatch: Feature-Based Augmentation for Semi-Supervised Learning [64.32306537419498]
We propose a novel learned feature-based refinement and augmentation method that produces a varied set of complex transformations. These transformations also use information from both within-class and across-class representations that we extract through clustering. We demonstrate that our method is comparable to current state of art for smaller datasets while being able to scale up to larger datasets.
arXiv Detail & Related papers (2020-07-16T17:55:31Z)
Improving Learning Effectiveness For Object Detection and Classification in Cluttered Backgrounds [6.729108277517129]
This paper develops a framework that permits to autonomously generate a training dataset in heterogeneous cluttered backgrounds. It is clear that the learning effectiveness of the proposed framework should be improved in complex and heterogeneous environments. The performance of the proposed framework is investigated through empirical tests and compared with that of the model trained with the COCO dataset.
arXiv Detail & Related papers (2020-02-27T22:28:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.