Multi-view Image Diffusion via Coordinate Noise and Fourier Attention
- URL: http://arxiv.org/abs/2412.03756v1
- Date: Wed, 04 Dec 2024 22:49:40 GMT
- Title: Multi-view Image Diffusion via Coordinate Noise and Fourier Attention
- Authors: Justin Theiss, Norman Müller, Daeil Kim, Aayush Prakash,
- Abstract summary: We propose a diffusion process that attends to time-dependent spatial frequencies of features with a novel attention mechanism and cross-attention loss.
Our technique improves SOTA on several quantitative metrics with qualitatively better results when compared to other state-of-the-art approaches for multi-view consistency.
- Score: 5.251293630298169
- License:
- Abstract: Recently, text-to-image generation with diffusion models has made significant advancements in both higher fidelity and generalization capabilities compared to previous baselines. However, generating holistic multi-view consistent images from prompts still remains an important and challenging task. To address this challenge, we propose a diffusion process that attends to time-dependent spatial frequencies of features with a novel attention mechanism as well as novel noise initialization technique and cross-attention loss. This Fourier-based attention block focuses on features from non-overlapping regions of the generated scene in order to better align the global appearance. Our noise initialization technique incorporates shared noise and low spatial frequency information derived from pixel coordinates and depth maps to induce noise correlations across views. The cross-attention loss further aligns features sharing the same prompt across the scene. Our technique improves SOTA on several quantitative metrics with qualitatively better results when compared to other state-of-the-art approaches for multi-view consistency.
Related papers
- CoCoNO: Attention Contrast-and-Complete for Initial Noise Optimization in Text-to-Image Synthesis [8.386261591495103]
We introduce CoCoNO, a new algorithm that optimize the initial latent by leveraging the complementary information within self-attention and cross-attention maps.
Our method introduces two new loss functions: the attention contrast loss, which minimizes undesirable overlap by ensuring each self-attention segment is exclusively linked to a specific subject's cross attention map, and the attention complete loss, which maximizes the activation within these segments to guarantee that each subject is fully and distinctly represented.
arXiv Detail & Related papers (2024-11-25T08:20:14Z) - Meta-Exploiting Frequency Prior for Cross-Domain Few-Shot Learning [86.99944014645322]
We introduce a novel framework, Meta-Exploiting Frequency Prior for Cross-Domain Few-Shot Learning.
We decompose each query image into its high-frequency and low-frequency components, and parallel incorporate them into the feature embedding network.
Our framework establishes new state-of-the-art results on multiple cross-domain few-shot learning benchmarks.
arXiv Detail & Related papers (2024-11-03T04:02:35Z) - Robust Network Learning via Inverse Scale Variational Sparsification [55.64935887249435]
We introduce an inverse scale variational sparsification framework within a time-continuous inverse scale space formulation.
Unlike frequency-based methods, our approach not only removes noise by smoothing small-scale features.
We show the efficacy of our approach through enhanced robustness against various noise types.
arXiv Detail & Related papers (2024-09-27T03:17:35Z) - Hybrid Convolutional and Attention Network for Hyperspectral Image Denoising [54.110544509099526]
Hyperspectral image (HSI) denoising is critical for the effective analysis and interpretation of hyperspectral data.
We propose a hybrid convolution and attention network (HCANet) to enhance HSI denoising.
Experimental results on mainstream HSI datasets demonstrate the rationality and effectiveness of the proposed HCANet.
arXiv Detail & Related papers (2024-03-15T07:18:43Z) - Holistic Dynamic Frequency Transformer for Image Fusion and Exposure Correction [18.014481087171657]
The correction of exposure-related issues is a pivotal component in enhancing the quality of images.
This paper proposes a novel methodology that leverages the frequency domain to improve and unify the handling of exposure correction tasks.
Our proposed method achieves state-of-the-art results, paving the way for more sophisticated and unified solutions in exposure correction.
arXiv Detail & Related papers (2023-09-03T14:09:14Z) - Gated Multi-Resolution Transfer Network for Burst Restoration and
Enhancement [75.25451566988565]
We propose a novel Gated Multi-Resolution Transfer Network (GMTNet) to reconstruct a spatially precise high-quality image from a burst of low-quality raw images.
Detailed experimental analysis on five datasets validates our approach and sets a state-of-the-art for burst super-resolution, burst denoising, and low-light burst enhancement.
arXiv Detail & Related papers (2023-04-13T17:54:00Z) - Towards Robust Image-in-Audio Deep Steganography [14.1081872409308]
This paper extends and enhances an existing image-in-audio deep steganography method by focusing on improving its robustness.
The proposed enhancements include modifications to the loss function, utilization of the Short-Time Fourier Transform (STFT), introduction of redundancy in the encoding process for error correction, and buffering of additional information in the pixel subconvolution operation.
arXiv Detail & Related papers (2023-03-09T03:16:04Z) - Multi-Frequency-Aware Patch Adversarial Learning for Neural Point Cloud
Rendering [7.522462414919854]
We present a neural point cloud rendering pipeline through a novel multi-frequency-aware patch adversarial learning framework.
The proposed approach aims to improve the rendering realness by minimizing the spectrum discrepancy between real and synthesized images.
Our method produces state-of-the-art results for neural point cloud rendering by a significant margin.
arXiv Detail & Related papers (2022-10-07T16:54:15Z) - Amplitude-Phase Recombination: Rethinking Robustness of Convolutional
Neural Networks in Frequency Domain [31.182376196295365]
CNN tends to converge at the local optimum which is closely related to the high-frequency components of the training images.
A new perspective on data augmentation designed by re-combing the phase spectrum of the current image and the amplitude spectrum of the distracter image.
arXiv Detail & Related papers (2021-08-19T04:04:41Z) - Spatial-Phase Shallow Learning: Rethinking Face Forgery Detection in
Frequency Domain [88.7339322596758]
We present a novel Spatial-Phase Shallow Learning (SPSL) method, which combines spatial image and phase spectrum to capture the up-sampling artifacts of face forgery.
SPSL can achieve the state-of-the-art performance on cross-datasets evaluation as well as multi-class classification and obtain comparable results on single dataset evaluation.
arXiv Detail & Related papers (2021-03-02T16:45:08Z) - ADRN: Attention-based Deep Residual Network for Hyperspectral Image
Denoising [52.01041506447195]
We propose an attention-based deep residual network to learn a mapping from noisy HSI to the clean one.
Experimental results demonstrate that our proposed ADRN scheme outperforms the state-of-the-art methods both in quantitative and visual evaluations.
arXiv Detail & Related papers (2020-03-04T08:36:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.