RAM++: Robust Representation Learning via Adaptive Mask for All-in-One Image Restoration
- URL: http://arxiv.org/abs/2509.12039v1
- Date: Mon, 15 Sep 2025 15:24:15 GMT
- Title: RAM++: Robust Representation Learning via Adaptive Mask for All-in-One Image Restoration
- Authors: Zilong Zhang, Chujie Qin, Chunle Guo, Yong Zhang, Chao Xue, Ming-Ming Cheng, Chongyi Li,
- Abstract summary: RAM++ is a two-stage framework for all-in-one image restoration.<n>It integrates high-level semantic understanding with low-level texture generation.<n>It addresses the limitations of existing degradation-oriented methods in extreme scenarios.
- Score: 94.49712266736141
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This work presents Robust Representation Learning via Adaptive Mask (RAM++), a two-stage framework for all-in-one image restoration. RAM++ integrates high-level semantic understanding with low-level texture generation to achieve content-oriented robust restoration. It addresses the limitations of existing degradation-oriented methods in extreme scenarios (e.g., degradations strongly coupled with image structures). RAM++ also mitigates common challenges such as unbalanced performance across tasks, overfitting to seen degradations, and weak generalization to unseen ones through three key designs: 1) Adaptive Semantic-Aware Mask (AdaSAM): a pretraining strategy that applies pixel-level masks to semantically rich and textured regions. This design enables the network to learn both generative priors and image content priors from various degradations. 2) Mask Attribute Conductance (MAC): a selective fine-tuning strategy that adjusts the layers with higher contributions to bridge the integrity gap between masked pretraining and full-image fine-tuning while retaining learned priors. 3) Robust Feature Regularization (RFR): a strategy that leverages DINOv2's semantically consistent and degradation-invariant representations, together with efficient feature fusion, to achieve faithful and semantically coherent restoration. With these designs, RAM++ achieves robust, well-balanced, and state-of-the-art performance across seen, unseen, extreme, and mixed degradations. Our code and model will be released at https://github.com/DragonisCV/RAM
Related papers
- ResTok: Learning Hierarchical Residuals in 1D Visual Tokenizers for Autoregressive Image Generation [64.84095852784714]
Residual Tokenizer (ResTok) is a 1D visual tokenizer that builds hierarchical residuals for both image tokens and latent tokens.<n>We show that restoring hierarchical residual priors in visual tokenization significantly improves AR image generation, achieving a gFID of 2.34 on ImageNet-256 with only 9 sampling steps.
arXiv Detail & Related papers (2026-01-07T14:09:18Z) - Plug-and-Play Context Feature Reuse for Efficient Masked Generation [36.563229330549284]
Masked generative models (MGMs) have emerged as a powerful framework for image synthesis.<n>We introduce ReCAP (Reused Context-Aware Prediction), a plug-and-play module that accelerates inference in MGMs.
arXiv Detail & Related papers (2025-05-25T10:57:35Z) - GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation [81.58846231702026]
We introduce GigaTok, the first approach to improve image reconstruction, generation, and representation learning when scaling visual tokenizers.<n>We identify the growing complexity of latent space as the key factor behind the reconstruction vs. generation dilemma.<n>By scaling to $bf3 space billion$ parameters, GigaTok achieves state-of-the-art performance in reconstruction, downstream AR generation, and downstream AR representation quality.
arXiv Detail & Related papers (2025-04-11T17:59:58Z) - Harmonizing Visual Representations for Unified Multimodal Understanding and Generation [53.01486796503091]
We present emphHarmon, a unified autoregressive framework that harmonizes understanding and generation tasks with a shared MAR encoder.<n>Harmon achieves state-of-the-art image generation results on the GenEval, MJHQ30K and WISE benchmarks.
arXiv Detail & Related papers (2025-03-27T20:50:38Z) - Enhancing Image Matting in Real-World Scenes with Mask-Guided Iterative Refinement [4.006320049969407]
Mask2Alpha is an iterative refinement framework designed to enhance semantic comprehension, instance awareness, and fine-detail recovery in image matting.<n>Our framework leverages self-supervised Vision Transformer features as semantic priors, strengthening contextual understanding in complex scenarios.<n>Mask2Alpha consistently achieves state-of-the-art results, showcasing its effectiveness in accurate and efficient image matting.
arXiv Detail & Related papers (2025-02-24T12:16:28Z) - Restore Anything with Masks: Leveraging Mask Image Modeling for Blind All-in-One Image Restoration [35.3663995646582]
All-in-one image restoration aims to handle multiple degradation types using one model.
This paper proposes a simple pipeline for all-in-one blind image restoration to Restore Anything with Masks (RAM)
arXiv Detail & Related papers (2024-09-28T16:33:43Z) - UGMAE: A Unified Framework for Graph Masked Autoencoders [67.75493040186859]
We propose UGMAE, a unified framework for graph masked autoencoders.
We first develop an adaptive feature mask generator to account for the unique significance of nodes.
We then design a ranking-based structure reconstruction objective joint with feature reconstruction to capture holistic graph information.
arXiv Detail & Related papers (2024-02-12T19:39:26Z) - Not All Image Regions Matter: Masked Vector Quantization for
Autoregressive Image Generation [78.13793505707952]
Existing autoregressive models follow the two-stage generation paradigm that first learns a codebook in the latent space for image reconstruction and then completes the image generation autoregressively based on the learned codebook.
We propose a novel two-stage framework, which consists of Masked Quantization VAE (MQ-VAE) Stack model from modeling redundancy.
arXiv Detail & Related papers (2023-05-23T02:15:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.