Related papers: Enhancing Image Matting in Real-World Scenes with Mask-Guided Iterative Refinement

Enhancing Image Matting in Real-World Scenes with Mask-Guided Iterative Refinement

URL: http://arxiv.org/abs/2502.17093v1
Date: Mon, 24 Feb 2025 12:16:28 GMT
Title: Enhancing Image Matting in Real-World Scenes with Mask-Guided Iterative Refinement
Authors: Rui Liu,
Abstract summary: Mask2Alpha is an iterative refinement framework designed to enhance semantic comprehension, instance awareness, and fine-detail recovery in image matting.<n>Our framework leverages self-supervised Vision Transformer features as semantic priors, strengthening contextual understanding in complex scenarios.<n>Mask2Alpha consistently achieves state-of-the-art results, showcasing its effectiveness in accurate and efficient image matting.
Score: 4.006320049969407
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Real-world image matting is essential for applications in content creation and augmented reality. However, it remains challenging due to the complex nature of scenes and the scarcity of high-quality datasets. To address these limitations, we introduce Mask2Alpha, an iterative refinement framework designed to enhance semantic comprehension, instance awareness, and fine-detail recovery in image matting. Our framework leverages self-supervised Vision Transformer features as semantic priors, strengthening contextual understanding in complex scenarios. To further improve instance differentiation, we implement a mask-guided feature selection module, enabling precise targeting of objects in multi-instance settings. Additionally, a sparse convolution-based optimization scheme allows Mask2Alpha to recover high-resolution details through progressive refinement,from low-resolution semantic passes to high-resolution sparse reconstructions. Benchmarking across various real-world datasets, Mask2Alpha consistently achieves state-of-the-art results, showcasing its effectiveness in accurate and efficient image matting.

Related papers

Leveraging Multi-Modal Information to Enhance Dataset Distillation [9.251951276795255]
We introduce two key enhancements to dataset distillation: caption-guided supervision and object-centric masking.<n>To integrate textual information, we propose two strategies for leveraging caption features: the feature concatenation and caption matching.<n> Comprehensive evaluations demonstrate that integrating caption-based guidance and object-centric masking enhances dataset distillation.
arXiv Detail & Related papers (2025-05-13T14:20:11Z)
MaskAttn-UNet: A Mask Attention-Driven Framework for Universal Low-Resolution Image Segmentation [5.130440339897479]
MaskAttn-UNet is a novel segmentation framework that enhances the traditional U-Net architecture via a mask attention mechanism. Our model selectively emphasizes important regions while suppressing irrelevant backgrounds, thereby improving segmentation accuracy in cluttered and complex scenes. Our results show that MaskAttn-UNet achieves accuracy comparable to state-of-the-art methods at significantly lower computational cost than transformer-based models.
arXiv Detail & Related papers (2025-03-11T22:43:26Z)
Towards Fine-grained Interactive Segmentation in Images and Videos [21.22536962888316]
We present an SAM2Refiner framework built upon the SAM2 backbone.<n>This architecture allows SAM2 to generate fine-grained segmentation masks for both images and videos.<n>In addition, a mask refinement module is devised by employing a multi-scale cascaded structure to fuse mask features with hierarchical representations from the encoder.
arXiv Detail & Related papers (2025-02-12T06:38:18Z)
Directing Mamba to Complex Textures: An Efficient Texture-Aware State Space Model for Image Restoration [75.51789992466183]
TAMambaIR simultaneously perceives image textures achieves and a trade-off between performance and efficiency.<n>Extensive experiments on benchmarks for image super-resolution, deraining, and low-light image enhancement demonstrate that TAMambaIR achieves state-of-the-art performance with significantly improved efficiency.
arXiv Detail & Related papers (2025-01-27T23:53:49Z)
MaskRIS: Semantic Distortion-aware Data Augmentation for Referring Image Segmentation [38.3201448852059]
Referring Image Code (RIS) is an advanced vision-aware task that involves identifying and segmenting objects within an image.<n>We propose a novel training framework called Masked Referring Image Code (MaskRIS)<n>MaskRIS uses both image and text masking, followed by Contextual Learning to fully exploit the benefits of the masking strategy.
arXiv Detail & Related papers (2024-11-28T11:27:56Z)
UGMAE: A Unified Framework for Graph Masked Autoencoders [67.75493040186859]
We propose UGMAE, a unified framework for graph masked autoencoders. We first develop an adaptive feature mask generator to account for the unique significance of nodes. We then design a ranking-based structure reconstruction objective joint with feature reconstruction to capture holistic graph information.
arXiv Detail & Related papers (2024-02-12T19:39:26Z)
PRISM: Progressive Restoration for Scene Graph-based Image Manipulation [47.77003316561398]
PRISM is a novel multi-head image manipulation approach to improve the accuracy and quality of the manipulated regions in the scene. Our results demonstrate the potential of our approach for enhancing the quality and precision of scene graph-based image manipulation.
arXiv Detail & Related papers (2023-11-03T21:30:34Z)
MM-3DScene: 3D Scene Understanding by Customizing Masked Modeling with Informative-Preserved Reconstruction and Self-Distilled Consistency [120.9499803967496]
We propose a novel informative-preserved reconstruction, which explores local statistics to discover and preserve the representative structured points. Our method can concentrate on modeling regional geometry and enjoy less ambiguity for masked reconstruction. By combining informative-preserved reconstruction on masked areas and consistency self-distillation from unmasked areas, a unified framework called MM-3DScene is yielded.
arXiv Detail & Related papers (2022-12-20T01:53:40Z)
High-Quality Entity Segmentation [110.55724145851725]
CropFormer is designed to tackle the intractability of instance-level segmentation on high-resolution images. It improves mask prediction by fusing high-res image crops that provide more fine-grained image details and the full image. With CropFormer, we achieve a significant AP gain of $1.9$ on the challenging entity segmentation task.
arXiv Detail & Related papers (2022-11-10T18:58:22Z)
Unsupervised Structure-Consistent Image-to-Image Translation [6.282068591820945]
The Swapping Autoencoder achieved state-of-the-art performance in deep image manipulation and image-to-image translation. We improve this work by introducing a simple yet effective auxiliary module based on gradient reversal layers. The auxiliary module's loss forces the generator to learn to reconstruct an image with an all-zero texture code.
arXiv Detail & Related papers (2022-08-24T13:47:15Z)
Controllable Person Image Synthesis with Spatially-Adaptive Warped Normalization [72.65828901909708]
Controllable person image generation aims to produce realistic human images with desirable attributes. We introduce a novel Spatially-Adaptive Warped Normalization (SAWN), which integrates a learned flow-field to warp modulation parameters. We propose a novel self-training part replacement strategy to refine the pretrained model for the texture-transfer task.
arXiv Detail & Related papers (2021-05-31T07:07:44Z)
Semantic Layout Manipulation with High-Resolution Sparse Attention [106.59650698907953]
We tackle the problem of semantic image layout manipulation, which aims to manipulate an input image by editing its semantic label map. A core problem of this task is how to transfer visual details from the input images to the new semantic layout while making the resulting image visually realistic. We propose a high-resolution sparse attention module that effectively transfers visual details to new layouts at a resolution up to 512x512.
arXiv Detail & Related papers (2020-12-14T06:50:43Z)
Bridging Composite and Real: Towards End-to-end Deep Image Matting [88.79857806542006]
We study the roles of semantics and details for image matting. We propose a novel Glance and Focus Matting network (GFM), which employs a shared encoder and two separate decoders. Comprehensive empirical studies have demonstrated that GFM outperforms state-of-the-art methods.
arXiv Detail & Related papers (2020-10-30T10:57:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.