DIffSteISR: Harnessing Diffusion Prior for Superior Real-world Stereo Image Super-Resolution
- URL: http://arxiv.org/abs/2408.07516v2
- Date: Thu, 15 Aug 2024 02:14:18 GMT
- Title: DIffSteISR: Harnessing Diffusion Prior for Superior Real-world Stereo Image Super-Resolution
- Authors: Yuanbo Zhou, Xinlin Zhang, Wei Deng, Tao Wang, Tao Tan, Qinquan Gao, Tong Tong,
- Abstract summary: We introduce DiffSteISR, a pioneering framework for reconstructing real-world stereo images.
DiffSteISR utilizes the powerful prior knowledge embedded in pre-trained text-to-image model to efficiently recover the lost texture details.
We show that DiffSteISR accurately reconstructs natural and precise textures from low-resolution stereo images.
- Score: 9.051054674138646
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce DiffSteISR, a pioneering framework for reconstructing real-world stereo images. DiffSteISR utilizes the powerful prior knowledge embedded in pre-trained text-to-image model to efficiently recover the lost texture details in low-resolution stereo images. Specifically, DiffSteISR implements a time-aware stereo cross attention with temperature adapter (TASCATA) to guide the diffusion process, ensuring that the generated left and right views exhibit high texture consistency thereby reducing disparity error between the super-resolved images and the ground truth (GT) images. Additionally, a stereo omni attention control network (SOA ControlNet) is proposed to enhance the consistency of super-resolved images with GT images in the pixel, perceptual, and distribution space. Finally, DiffSteISR incorporates a stereo semantic extractor (SSE) to capture unique viewpoint soft semantic information and shared hard tag semantic information, thereby effectively improving the semantic accuracy and consistency of the generated left and right images. Extensive experimental results demonstrate that DiffSteISR accurately reconstructs natural and precise textures from low-resolution stereo images while maintaining a high consistency of semantic and texture between the left and right views.
Related papers
- Reconstructive Visual Instruction Tuning [64.91373889600136]
reconstructive visual instruction tuning (ROSS) is a family of Large Multimodal Models (LMMs) that exploit vision-centric supervision signals.
It reconstructs latent representations of input images, avoiding directly regressing exact raw RGB values.
Empirically, ROSS consistently brings significant improvements across different visual encoders and language models.
arXiv Detail & Related papers (2024-10-12T15:54:29Z) - WTCL-Dehaze: Rethinking Real-world Image Dehazing via Wavelet Transform and Contrastive Learning [17.129068060454255]
Single image dehazing is essential for applications such as autonomous driving and surveillance.
We propose an enhanced semi-supervised dehazing network that integrates Contrastive Loss and Discrete Wavelet Transform.
Our proposed algorithm achieves superior performance and improved robustness compared to state-of-the-art single image dehazing methods.
arXiv Detail & Related papers (2024-10-07T05:36:11Z) - Self-Adaptive Reality-Guided Diffusion for Artifact-Free Super-Resolution [47.29558685384506]
Artifact-free super-resolution (SR) aims to translate low-resolution images into their high-resolution counterparts with a strict integrity of the original content.
Traditional diffusion-based SR techniques are prone to artifact introduction during iterative procedures.
We propose Self-Adaptive Reality-Guided Diffusion to identify and mitigate the propagation of artifacts.
arXiv Detail & Related papers (2024-03-25T11:29:19Z) - Toward Real World Stereo Image Super-Resolution via Hybrid Degradation
Model and Discriminator for Implied Stereo Image Information [10.957275128743529]
Real-world stereo image super-resolution has a significant influence on enhancing the performance of computer vision systems.
Existing methods for single-image super-resolution can be applied to improve stereo images.
This paper proposes a novel approach that integrates a implicit stereo information discriminator and a hybrid degradation model.
arXiv Detail & Related papers (2023-12-13T07:24:50Z) - CoSeR: Bridging Image and Language for Cognitive Super-Resolution [74.24752388179992]
We introduce the Cognitive Super-Resolution (CoSeR) framework, empowering SR models with the capacity to comprehend low-resolution images.
We achieve this by marrying image appearance and language understanding to generate a cognitive embedding.
To further improve image fidelity, we propose a novel condition injection scheme called "All-in-Attention"
arXiv Detail & Related papers (2023-11-27T16:33:29Z) - Recognition-Guided Diffusion Model for Scene Text Image Super-Resolution [15.391125077873745]
Scene Text Image Super-Resolution (STISR) aims to enhance the resolution and legibility of text within low-resolution (LR) images.
Previous methods predominantly employ discriminative Convolutional Neural Networks (CNNs) augmented with diverse forms of text guidance.
We introduce RGDiffSR, a Recognition-Guided Diffusion model for scene text image Super-Resolution, which exhibits great generative diversity and fidelity even in challenging scenarios.
arXiv Detail & Related papers (2023-11-22T11:10:45Z) - Enhancing Low-Light Images in Real World via Cross-Image Disentanglement [58.754943762945864]
We propose a new low-light image enhancement dataset consisting of misaligned training images with real-world corruptions.
Our model achieves state-of-the-art performances on both the newly proposed dataset and other popular low-light datasets.
arXiv Detail & Related papers (2022-01-10T03:12:52Z) - TWIST-GAN: Towards Wavelet Transform and Transferred GAN for
Spatio-Temporal Single Image Super Resolution [4.622977798361014]
Single Image Super-resolution (SISR) produces high-resolution images with fine spatial resolutions from a remotely sensed image with low spatial resolution.
Deep learning and generative adversarial networks (GANs) have made breakthroughs for the challenging task of single image super-resolution (SISR)
arXiv Detail & Related papers (2021-04-20T22:12:38Z) - Frequency Consistent Adaptation for Real World Super Resolution [64.91914552787668]
We propose a novel Frequency Consistent Adaptation (FCA) that ensures the frequency domain consistency when applying Super-Resolution (SR) methods to the real scene.
We estimate degradation kernels from unsupervised images and generate the corresponding Low-Resolution (LR) images.
Based on the domain-consistent LR-HR pairs, we train easy-implemented Convolutional Neural Network (CNN) SR models.
arXiv Detail & Related papers (2020-12-18T08:25:39Z) - Hyperspectral Image Super-resolution via Deep Progressive Zero-centric
Residual Learning [62.52242684874278]
Cross-modality distribution of spatial and spectral information makes the problem challenging.
We propose a novel textitlightweight deep neural network-based framework, namely PZRes-Net.
Our framework learns a high resolution and textitzero-centric residual image, which contains high-frequency spatial details of the scene.
arXiv Detail & Related papers (2020-06-18T06:32:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.