MegaSR: Mining Customized Semantics and Expressive Guidance for Image Super-Resolution
- URL: http://arxiv.org/abs/2503.08096v1
- Date: Tue, 11 Mar 2025 07:00:20 GMT
- Title: MegaSR: Mining Customized Semantics and Expressive Guidance for Image Super-Resolution
- Authors: Xinrui Li, Jianlong Wu, Xinchuan Huang, Chong Chen, Weili Guan, Xian-Sheng Hua, Liqiang Nie,
- Abstract summary: MegaSR mines customized block-wise semantics and expressive guidance for diffusion-based ISR.<n>We experimentally identify HED edge maps, depth maps, and segmentation maps as the most expressive guidance.<n>Extensive experiments demonstrate the superiority of MegaSR in terms of semantic richness and structural consistency.
- Score: 76.30559905769859
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pioneering text-to-image (T2I) diffusion models have ushered in a new era of real-world image super-resolution (Real-ISR), significantly enhancing the visual perception of reconstructed images. However, existing methods typically integrate uniform abstract textual semantics across all blocks, overlooking the distinct semantic requirements at different depths and the fine-grained, concrete semantics inherently present in the images themselves. Moreover, relying solely on a single type of guidance further disrupts the consistency of reconstruction. To address these issues, we propose MegaSR, a novel framework that mines customized block-wise semantics and expressive guidance for diffusion-based ISR. Compared to uniform textual semantics, MegaSR enables flexible adaptation to multi-granularity semantic awareness by dynamically incorporating image attributes at each block. Furthermore, we experimentally identify HED edge maps, depth maps, and segmentation maps as the most expressive guidance, and propose a multi-stage aggregation strategy to modulate them into the T2I models. Extensive experiments demonstrate the superiority of MegaSR in terms of semantic richness and structural consistency.
Related papers
- DRC: Enhancing Personalized Image Generation via Disentangled Representation Composition [69.10628479553709]
We introduce DRC, a novel personalized image generation framework that enhances Large Multimodal Models (LMMs)
DRC explicitly extracts user style preferences and semantic intentions from history images and the reference image, respectively.
It involves two critical learning stages: 1) Disentanglement learning, which employs a dual-tower disentangler to explicitly separate style and semantic features, optimized via a reconstruction-driven paradigm with difficulty-aware importance sampling; and 2) Personalized modeling, which applies semantic-preserving augmentations to effectively adapt the disentangled representations for robust personalized generation.
arXiv Detail & Related papers (2025-04-24T08:10:10Z) - Semantic Segmentation Prior for Diffusion-Based Real-World Super-Resolution [22.655127409294554]
Real-world image super-resolution (Real-ISR) has achieved a remarkable leap by leveraging large-scale text-to-image models.<n>We propose to consider semantic segmentation as an additional control condition into diffusion-based image super-resolution.
arXiv Detail & Related papers (2024-12-04T02:11:09Z) - HoliSDiP: Image Super-Resolution via Holistic Semantics and Diffusion Prior [62.04939047885834]
We present HoliSDiP, a framework that leverages semantic segmentation to provide both precise textual and spatial guidance for Real-ISR.<n>Our method employs semantic labels as concise text prompts while introducing dense semantic guidance through segmentation masks and our proposed spatial-CLIP Map.
arXiv Detail & Related papers (2024-11-27T15:22:44Z) - SeD: Semantic-Aware Discriminator for Image Super-Resolution [20.646975821512395]
Generative Adversarial Networks (GANs) have been widely used to recover vivid textures in image super-resolution (SR) tasks.
One discriminator is utilized to enable the SR network to learn the distribution of real-world high-quality images in an adversarial training manner.
We propose the simple and effective Semantic-aware Discriminator ( SeD)
SeD encourages the SR network to learn the fine-grained distributions by introducing the semantics of images as a condition.
arXiv Detail & Related papers (2024-02-29T17:38:54Z) - Recognition-Guided Diffusion Model for Scene Text Image Super-Resolution [15.391125077873745]
Scene Text Image Super-Resolution (STISR) aims to enhance the resolution and legibility of text within low-resolution (LR) images.
Previous methods predominantly employ discriminative Convolutional Neural Networks (CNNs) augmented with diverse forms of text guidance.
We introduce RGDiffSR, a Recognition-Guided Diffusion model for scene text image Super-Resolution, which exhibits great generative diversity and fidelity even in challenging scenarios.
arXiv Detail & Related papers (2023-11-22T11:10:45Z) - RealignDiff: Boosting Text-to-Image Diffusion Model with Coarse-to-fine Semantic Re-alignment [112.45442468794658]
We propose a two-stage coarse-to-fine semantic re-alignment method, named RealignDiff.
In the coarse semantic re-alignment phase, a novel caption reward is proposed to evaluate the semantic discrepancy between the generated image caption and the given text prompt.
The fine semantic re-alignment stage employs a local dense caption generation module and a re-weighting attention modulation module to refine the previously generated images from a local semantic view.
arXiv Detail & Related papers (2023-05-31T06:59:21Z) - DSE-GAN: Dynamic Semantic Evolution Generative Adversarial Network for
Text-to-Image Generation [71.87682778102236]
We propose a novel Dynamical Semantic Evolution GAN (DSE-GAN) to re-compose each stage's text features under a novel single adversarial multi-stage architecture.
DSE-GAN achieves 7.48% and 37.8% relative FID improvement on two widely used benchmarks.
arXiv Detail & Related papers (2022-09-03T06:13:26Z) - DDet: Dual-path Dynamic Enhancement Network for Real-World Image
Super-Resolution [69.2432352477966]
Real image super-resolution(Real-SR) focus on the relationship between real-world high-resolution(HR) and low-resolution(LR) image.
In this article, we propose a Dual-path Dynamic Enhancement Network(DDet) for Real-SR.
Unlike conventional methods which stack up massive convolutional blocks for feature representation, we introduce a content-aware framework to study non-inherently aligned image pair.
arXiv Detail & Related papers (2020-02-25T18:24:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.