Wavelet Transform-assisted Adaptive Generative Modeling for Colorization
- URL: http://arxiv.org/abs/2107.04261v1
- Date: Fri, 9 Jul 2021 07:12:39 GMT
- Title: Wavelet Transform-assisted Adaptive Generative Modeling for Colorization
- Authors: Jin Li, Wanyun Li, Zichen Xu, Yuhao Wang, Qiegen Liu
- Abstract summary: This study presents a novel scheme that exploiting the score-based generative model in wavelet domain to address the issue.
By taking advantage of the multi-scale and multi-channel representation via wavelet transform, the proposed model learns the priors from stacked wavelet coefficient components.
Experiments demonstrated remarkable improvements of the proposed model on colorization quality, particularly on colorization robustness and diversity.
- Score: 15.814591440291652
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Unsupervised deep learning has recently demonstrated the promise to produce
high-quality samples. While it has tremendous potential to promote the image
colorization task, the performance is limited owing to the manifold hypothesis
in machine learning. This study presents a novel scheme that exploiting the
score-based generative model in wavelet domain to address the issue. By taking
advantage of the multi-scale and multi-channel representation via wavelet
transform, the proposed model learns the priors from stacked wavelet
coefficient components, thus learns the image characteristics under coarse and
detail frequency spectrums jointly and effectively. Moreover, such a highly
flexible generative model without adversarial optimization can execute
colorization tasks better under dual consistency terms in wavelet domain,
namely data-consistency and structure-consistency. Specifically, in the
training phase, a set of multi-channel tensors consisting of wavelet
coefficients are used as the input to train the network by denoising score
matching. In the test phase, samples are iteratively generated via annealed
Langevin dynamics with data and structure consistencies. Experiments
demonstrated remarkable improvements of the proposed model on colorization
quality, particularly on colorization robustness and diversity.
Related papers
- Oscillation Inversion: Understand the structure of Large Flow Model through the Lens of Inversion Method [60.88467353578118]
We show that a fixed-point-inspired iterative approach to invert real-world images does not achieve convergence, instead oscillating between distinct clusters.
We introduce a simple and fast distribution transfer technique that facilitates image enhancement, stroke-based recoloring, as well as visual prompt-guided image editing.
arXiv Detail & Related papers (2024-11-17T17:45:37Z) - A Simple Approach to Unifying Diffusion-based Conditional Generation [63.389616350290595]
We introduce a simple, unified framework to handle diverse conditional generation tasks.
Our approach enables versatile capabilities via different inference-time sampling schemes.
Our model supports additional capabilities like non-spatially aligned and coarse conditioning.
arXiv Detail & Related papers (2024-10-15T09:41:43Z) - WiNet: Wavelet-based Incremental Learning for Efficient Medical Image Registration [68.25711405944239]
Deep image registration has demonstrated exceptional accuracy and fast inference.
Recent advances have adopted either multiple cascades or pyramid architectures to estimate dense deformation fields in a coarse-to-fine manner.
We introduce a model-driven WiNet that incrementally estimates scale-wise wavelet coefficients for the displacement/velocity field across various scales.
arXiv Detail & Related papers (2024-07-18T11:51:01Z) - Stage-by-stage Wavelet Optimization Refinement Diffusion Model for
Sparse-View CT Reconstruction [14.037398189132468]
We present an innovative approach named the Stage-by-stage Wavelet Optimization Refinement Diffusion (SWORD) model for sparse-view CT reconstruction.
Specifically, we establish a unified mathematical model integrating low-frequency and high-frequency generative models, achieving the solution with optimization procedure.
Our method rooted in established optimization theory, comprising three distinct stages, including low-frequency generation, high-frequency refinement and domain transform.
arXiv Detail & Related papers (2023-08-30T10:48:53Z) - Period VITS: Variational Inference with Explicit Pitch Modeling for
End-to-end Emotional Speech Synthesis [19.422230767803246]
We propose Period VITS, a novel end-to-end text-to-speech model that incorporates an explicit periodicity generator.
In the proposed method, we introduce a frame pitch predictor that predicts prosodic features, such as pitch and voicing flags, from the input text.
From these features, the proposed periodicity generator produces a sample-level sinusoidal source that enables the waveform decoder to accurately reproduce the pitch.
arXiv Detail & Related papers (2022-10-28T07:52:30Z) - Auto-regressive Image Synthesis with Integrated Quantization [55.51231796778219]
This paper presents a versatile framework for conditional image generation.
It incorporates the inductive bias of CNNs and powerful sequence modeling of auto-regression.
Our method achieves superior diverse image generation performance as compared with the state-of-the-art.
arXiv Detail & Related papers (2022-07-21T22:19:17Z) - High-dimensional Assisted Generative Model for Color Image Restoration [12.459091135428885]
This work presents an unsupervised deep learning scheme that exploits high-dimensional assisted score-based generative model for color image restoration tasks.
Considering the sample number and internal dimension in score-based generative model, two different high-dimensional ways are proposed: The channel-copy transformation increases the sample number and the pixel-scale transformation decreases feasible dimension space.
To alleviate the difficulty of learning high-dimensional representation, a progressive strategy is proposed to leverage the performance.
arXiv Detail & Related papers (2021-08-14T04:05:29Z) - Diverse Semantic Image Synthesis via Probability Distribution Modeling [103.88931623488088]
We propose a novel diverse semantic image synthesis framework.
Our method can achieve superior diversity and comparable quality compared to state-of-the-art methods.
arXiv Detail & Related papers (2021-03-11T18:59:25Z) - Joint Intensity-Gradient Guided Generative Modeling for Colorization [16.89777347891486]
This paper proposes an iterative generative model for solving the automatic colorization problem.
Joint intensity-gradient constraint in data-fidelity term is proposed to limit the degree of freedom within generative model.
Experiments demonstrated that the system outperformed state-of-the-art methods whether in quantitative comparisons or user study.
arXiv Detail & Related papers (2020-12-28T07:52:55Z) - WaveGrad: Estimating Gradients for Waveform Generation [55.405580817560754]
WaveGrad is a conditional model for waveform generation which estimates gradients of the data density.
It starts from a Gaussian white noise signal and iteratively refines the signal via a gradient-based sampler conditioned on the mel-spectrogram.
We find that it can generate high fidelity audio samples using as few as six iterations.
arXiv Detail & Related papers (2020-09-02T17:44:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.