Related papers: XPSR: Cross-modal Priors for Diffusion-based Image Super-Resolution

XPSR: Cross-modal Priors for Diffusion-based Image Super-Resolution

URL: http://arxiv.org/abs/2403.05049v2
Date: Fri, 19 Jul 2024 16:31:19 GMT
Title: XPSR: Cross-modal Priors for Diffusion-based Image Super-Resolution
Authors: Yunpeng Qu, Kun Yuan, Kai Zhao, Qizhi Xie, Jinhua Hao, Ming Sun, Chao Zhou,
Abstract summary: Diffusion-based methods, endowed with a formidable generative prior, have received increasing attention in Image Super-Resolution. It is challenging for ISR models to perceive the semantic and degradation information, resulting in restoration images with incorrect content or unrealistic artifacts. We propose a textitCross-modal Priors for Super-Resolution (XPSR) framework to acquire precise and comprehensive semantic conditions for the diffusion model.
Score: 14.935662351654601
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Diffusion-based methods, endowed with a formidable generative prior, have received increasing attention in Image Super-Resolution (ISR) recently. However, as low-resolution (LR) images often undergo severe degradation, it is challenging for ISR models to perceive the semantic and degradation information, resulting in restoration images with incorrect content or unrealistic artifacts. To address these issues, we propose a \textit{Cross-modal Priors for Super-Resolution (XPSR)} framework. Within XPSR, to acquire precise and comprehensive semantic conditions for the diffusion model, cutting-edge Multimodal Large Language Models (MLLMs) are utilized. To facilitate better fusion of cross-modal priors, a \textit{Semantic-Fusion Attention} is raised. To distill semantic-preserved information instead of undesired degradations, a \textit{Degradation-Free Constraint} is attached between LR and its high-resolution (HR) counterpart. Quantitative and qualitative results show that XPSR is capable of generating high-fidelity and high-realism images across synthetic and real-world datasets. Codes are released at \url{https://github.com/qyp2000/XPSR}.

Related papers

Visual Autoregressive Modeling for Image Super-Resolution [14.935662351654601]
We propose a novel visual autoregressive modeling for ISR framework with the form of next-scale prediction. We collect large-scale data and design a training process to obtain robust generative priors.
arXiv Detail & Related papers (2025-01-31T09:53:47Z)
CLIP-SR: Collaborative Linguistic and Image Processing for Super-Resolution [21.843398350371867]
Convolutional Neural Networks (CNNs) have significantly advanced Image Super-Resolution (SR) Most CNN-based methods rely solely on pixel-based transformations, leading to artifacts and blurring. We propose a multi-modal semantic enhancement framework that integrates textual semantics with visual features.
arXiv Detail & Related papers (2024-12-16T09:50:09Z)
ConsisSR: Delving Deep into Consistency in Diffusion-based Image Super-Resolution [28.945663118445037]
Real-world image super-resolution (Real-ISR) aims at restoring high-quality (HQ) images from low-quality (LQ) inputs corrupted by unknown and complex degradations. We introduce ConsisSR to handle both semantic and pixel-level consistency.
arXiv Detail & Related papers (2024-10-17T17:41:52Z)
DaLPSR: Leverage Degradation-Aligned Language Prompt for Real-World Image Super-Resolution [19.33582308829547]
This paper proposes to leverage degradation-aligned language prompt for accurate, fine-grained, and high-fidelity image restoration. The proposed method achieves a new state-of-the-art perceptual quality level.
arXiv Detail & Related papers (2024-06-24T09:30:36Z)
SeeSR: Towards Semantics-Aware Real-World Image Super-Resolution [16.815468458589635]
We present a semantics-aware approach to better preserve the semantic fidelity of generative real-world image super-resolution. First, we train a degradation-aware prompt extractor, which can generate accurate soft and hard semantic prompts even under strong degradation. The experiments show that our method can reproduce more realistic image details and hold better the semantics.
arXiv Detail & Related papers (2023-11-27T18:11:19Z)
CoSeR: Bridging Image and Language for Cognitive Super-Resolution [74.24752388179992]
We introduce the Cognitive Super-Resolution (CoSeR) framework, empowering SR models with the capacity to comprehend low-resolution images. We achieve this by marrying image appearance and language understanding to generate a cognitive embedding. To further improve image fidelity, we propose a novel condition injection scheme called "All-in-Attention"
arXiv Detail & Related papers (2023-11-27T16:33:29Z)
RBSR: Efficient and Flexible Recurrent Network for Burst Super-Resolution [57.98314517861539]
Burst super-resolution (BurstSR) aims at reconstructing a high-resolution (HR) image from a sequence of low-resolution (LR) and noisy images. In this paper, we suggest fusing cues frame-by-frame with an efficient and flexible recurrent network.
arXiv Detail & Related papers (2023-06-30T12:14:13Z)
Implicit Diffusion Models for Continuous Super-Resolution [65.45848137914592]
This paper introduces an Implicit Diffusion Model (IDM) for high-fidelity continuous image super-resolution. IDM integrates an implicit neural representation and a denoising diffusion model in a unified end-to-end framework. The scaling factor regulates the resolution and accordingly modulates the proportion of the LR information and generated features in the final output.
arXiv Detail & Related papers (2023-03-29T07:02:20Z)
Hierarchical Similarity Learning for Aliasing Suppression Image Super-Resolution [64.15915577164894]
A hierarchical image super-resolution network (HSRNet) is proposed to suppress the influence of aliasing. HSRNet achieves better quantitative and visual performance than other works, and remits the aliasing more effectively.
arXiv Detail & Related papers (2022-06-07T14:55:32Z)
Best-Buddy GANs for Highly Detailed Image Super-Resolution [71.13466303340192]
We consider the single image super-resolution (SISR) problem, where a high-resolution (HR) image is generated based on a low-resolution (LR) input. Most methods along this line rely on a predefined single-LR-single-HR mapping, which is not flexible enough for the SISR task. We propose best-buddy GANs (Beby-GAN) for rich-detail SISR. Relaxing the immutable one-to-one constraint, we allow the estimated patches to dynamically seek the best supervision.
arXiv Detail & Related papers (2021-03-29T02:58:27Z)
Deep Cyclic Generative Adversarial Residual Convolutional Networks for Real Image Super-Resolution [20.537597542144916]
We consider a deep cyclic network structure to maintain the domain consistency between the LR and HR data distributions. We propose the Super-Resolution Residual Cyclic Generative Adversarial Network (SRResCycGAN) by training with a generative adversarial network (GAN) framework for the LR to HR domain translation.
arXiv Detail & Related papers (2020-09-07T11:11:18Z)
DDet: Dual-path Dynamic Enhancement Network for Real-World Image Super-Resolution [69.2432352477966]
Real image super-resolution(Real-SR) focus on the relationship between real-world high-resolution(HR) and low-resolution(LR) image. In this article, we propose a Dual-path Dynamic Enhancement Network(DDet) for Real-SR. Unlike conventional methods which stack up massive convolutional blocks for feature representation, we introduce a content-aware framework to study non-inherently aligned image pair.
arXiv Detail & Related papers (2020-02-25T18:24:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.