Related papers: DreamClear: High-Capacity Real-World Image Restoration with Privacy-Safe Dataset Curation

DreamClear: High-Capacity Real-World Image Restoration with Privacy-Safe Dataset Curation

URL: http://arxiv.org/abs/2410.18666v2
Date: Tue, 29 Oct 2024 05:50:12 GMT
Title: DreamClear: High-Capacity Real-World Image Restoration with Privacy-Safe Dataset Curation
Authors: Yuang Ai, Xiaoqiang Zhou, Huaibo Huang, Xiaotian Han, Zhengyu Chen, Quanzeng You, Hongxia Yang,
Abstract summary: We present a dual strategy: GenIR, an innovative data curation pipeline, and DreamClear, a cutting-edge Diffusion Transformer (DiT)-based image restoration model. GenIR, our pioneering contribution, is a dual-prompt learning pipeline that overcomes the limitations of existing datasets. DreamClear, is a DiT-based image restoration model. It utilizes the generative priors of text-to-image (T2I) diffusion models and the robust perceptual capabilities of multi-modal large language models (MLLMs) to achieve restoration.
Score: 46.22939360256696
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Image restoration (IR) in real-world scenarios presents significant challenges due to the lack of high-capacity models and comprehensive datasets. To tackle these issues, we present a dual strategy: GenIR, an innovative data curation pipeline, and DreamClear, a cutting-edge Diffusion Transformer (DiT)-based image restoration model. GenIR, our pioneering contribution, is a dual-prompt learning pipeline that overcomes the limitations of existing datasets, which typically comprise only a few thousand images and thus offer limited generalizability for larger models. GenIR streamlines the process into three stages: image-text pair construction, dual-prompt based fine-tuning, and data generation & filtering. This approach circumvents the laborious data crawling process, ensuring copyright compliance and providing a cost-effective, privacy-safe solution for IR dataset construction. The result is a large-scale dataset of one million high-quality images. Our second contribution, DreamClear, is a DiT-based image restoration model. It utilizes the generative priors of text-to-image (T2I) diffusion models and the robust perceptual capabilities of multi-modal large language models (MLLMs) to achieve photorealistic restoration. To boost the model's adaptability to diverse real-world degradations, we introduce the Mixture of Adaptive Modulator (MoAM). It employs token-wise degradation priors to dynamically integrate various restoration experts, thereby expanding the range of degradations the model can address. Our exhaustive experiments confirm DreamClear's superior performance, underlining the efficacy of our dual strategy for real-world image restoration. Code and pre-trained models are available at: https://github.com/shallowdream204/DreamClear.

Related papers

DPMambaIR:All-in-One Image Restoration via Degradation-Aware Prompt State Space Model [36.979833523678614]
All-in-One image restoration aims to address multiple image degradation problems. Existing approaches rely on Degradation-specific models or coarse-grained degradation prompts to guide image restoration. We propose DPMambaIR, a novel All-in-One image restoration framework.
arXiv Detail & Related papers (2025-04-24T16:46:32Z)
Acquire and then Adapt: Squeezing out Text-to-Image Model for Image Restoration [25.65952375846516]
We find that the well-trained large T2I model (i.e., Flux) is able to produce a variety of high-quality images aligned with real-world distributions. A novel light-weighted adapter (FluxIR) with squeeze-and-excitation layers is also carefully designed to control the large Diffusion Transformer (DiT)-based T2I model.
arXiv Detail & Related papers (2025-04-21T15:05:22Z)
Seedream 3.0 Technical Report [62.85849652170507]
Seedream 3.0 is a high-performance Chinese-English bilingual image generation foundation model. We develop several technical improvements to address existing challenges in Seedream 2.0. Seedream 3.0 provides native high-resolution output (up to 2K) allowing it to generate images with high visual quality.
arXiv Detail & Related papers (2025-04-15T16:19:07Z)
Visual Autoregressive Modeling for Image Super-Resolution [14.935662351654601]
We propose a novel visual autoregressive modeling for ISR framework with the form of next-scale prediction. We collect large-scale data and design a training process to obtain robust generative priors.
arXiv Detail & Related papers (2025-01-31T09:53:47Z)
FoundIR: Unleashing Million-scale Training Data to Advance Foundation Models for Image Restoration [66.61201445650323]
Existing methods suffer from a generalization bottleneck in real-world scenarios. We contribute a million-scale dataset with two notable advantages over existing training data. We propose a robust model, FoundIR, to better address a broader range of restoration tasks in real-world scenarios.
arXiv Detail & Related papers (2024-12-02T12:08:40Z)
Mixed Degradation Image Restoration via Local Dynamic Optimization and Conditional Embedding [67.57487747508179]
Multiple-in-one image restoration (IR) has made significant progress, aiming to handle all types of single degraded image restoration with a single model. In this paper, we propose a novel multiple-in-one IR model that can effectively restore images with both single and mixed degradations.
arXiv Detail & Related papers (2024-11-25T09:26:34Z)
Training-Free Large Model Priors for Multiple-in-One Image Restoration [24.230376300759573]
Large Model Driven Image Restoration framework (LMDIR) Our architecture comprises a query-based prompt encoder, degradation-aware transformer block injecting global degradation knowledge. This design facilitates single-stage training paradigm to address various degradations while supporting both automatic and user-guided restoration.
arXiv Detail & Related papers (2024-07-18T05:40:32Z)
Gradient Inversion of Federated Diffusion Models [4.1355611383748005]
Diffusion models are becoming defector generative models, which generate exceptionally high-resolution image data. In this paper, we study the privacy risk of gradient inversion attacks. We propose a triple-optimization GIDM+ that coordinates the optimization of the unknown data.
arXiv Detail & Related papers (2024-05-30T18:00:03Z)
Photo-Realistic Image Restoration in the Wild with Controlled Vision-Language Models [14.25759541950917]
This work leverages a capable vision-language model and a synthetic degradation pipeline to learn image restoration in the wild (wild IR) Our base diffusion model is the image restoration SDE (IR-SDE)
arXiv Detail & Related papers (2024-04-15T12:34:21Z)
Exposure Bracketing is All You Need for Unifying Image Restoration and Enhancement Tasks [50.822601495422916]
We propose to utilize exposure bracketing photography to unify image restoration and enhancement tasks. Due to the difficulty in collecting real-world pairs, we suggest a solution that first pre-trains the model with synthetic paired data. In particular, a temporally modulated recurrent network (TMRNet) and self-supervised adaptation method are proposed.
arXiv Detail & Related papers (2024-01-01T14:14:35Z)
Multi-task Image Restoration Guided By Robust DINO Features [88.74005987908443]
We propose mboxtextbfDINO-IR, a multi-task image restoration approach leveraging robust features extracted from DINOv2. We first propose a pixel-semantic fusion (PSF) module to dynamically fuse DINOV2's shallow features. By formulating these modules into a unified deep model, we propose a DINO perception contrastive loss to constrain the model training.
arXiv Detail & Related papers (2023-12-04T06:59:55Z)
DiffBIR: Towards Blind Image Restoration with Generative Diffusion Prior [70.46245698746874]
We present DiffBIR, a general restoration pipeline that could handle different blind image restoration tasks. DiffBIR decouples blind image restoration problem into two stages: 1) degradation removal: removing image-independent content; 2) information regeneration: generating the lost image content. In the first stage, we use restoration modules to remove degradations and obtain high-fidelity restored results. For the second stage, we propose IRControlNet that leverages the generative ability of latent diffusion models to generate realistic details.
arXiv Detail & Related papers (2023-08-29T07:11:52Z)
Spectral Graphormer: Spectral Graph-based Transformer for Egocentric Two-Hand Reconstruction using Multi-View Color Images [33.70056950818641]
We propose a novel transformer-based framework that reconstructs two high fidelity hands from multi-view RGB images. We show that our framework is able to produce realistic two-hand reconstructions and demonstrate the generalisation of synthetic-trained models to real data.
arXiv Detail & Related papers (2023-08-21T20:07:02Z)
Refusion: Enabling Large-Size Realistic Image Restoration with Latent-Space Diffusion Models [9.245782611878752]
We enhance the diffusion model in several aspects such as network architecture, noise level, denoising steps, training image size, and perceptual/scheduler scores. We also propose a U-Net based latent diffusion model which performs diffusion in a low-resolution latent space while preserving high-resolution information from the original input for the decoding process. These modifications allow us to apply diffusion models to various image restoration tasks, including real-world shadow removal, HR non-homogeneous dehazing, stereo super-resolution, and bokeh effect transformation.
arXiv Detail & Related papers (2023-04-17T14:06:49Z)
Multi-Stage Progressive Image Restoration [167.6852235432918]
We propose a novel synergistic design that can optimally balance these competing goals. Our main proposal is a multi-stage architecture, that progressively learns restoration functions for the degraded inputs. The resulting tightly interlinked multi-stage architecture, named as MPRNet, delivers strong performance gains on ten datasets.
arXiv Detail & Related papers (2021-02-04T18:57:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.