Diffusion for Natural Image Matting
- URL: http://arxiv.org/abs/2312.05915v1
- Date: Sun, 10 Dec 2023 15:28:56 GMT
- Title: Diffusion for Natural Image Matting
- Authors: Yihan Hu, Yiheng Lin, Wei Wang, Yao Zhao, Yunchao Wei, Humphrey Shi
- Abstract summary: We present DiffMatte, a solution designed to overcome the challenges of image matting.
First, DiffMatte decouples the decoder from the intricately coupled matting network design, involving only one lightweight decoder in the iterations of the diffusion process.
Second, we employ a self-aligned training strategy with uniform time intervals, ensuring a consistent noise sampling between training and inference across the entire time domain.
- Score: 93.86689168212241
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We aim to leverage diffusion to address the challenging image matting task.
However, the presence of high computational overhead and the inconsistency of
noise sampling between the training and inference processes pose significant
obstacles to achieving this goal. In this paper, we present DiffMatte, a
solution designed to effectively overcome these challenges. First, DiffMatte
decouples the decoder from the intricately coupled matting network design,
involving only one lightweight decoder in the iterations of the diffusion
process. With such a strategy, DiffMatte mitigates the growth of computational
overhead as the number of samples increases. Second, we employ a self-aligned
training strategy with uniform time intervals, ensuring a consistent noise
sampling between training and inference across the entire time domain. Our
DiffMatte is designed with flexibility in mind and can seamlessly integrate
into various modern matting architectures. Extensive experimental results
demonstrate that DiffMatte not only reaches the state-of-the-art level on the
Composition-1k test set, surpassing the best methods in the past by 5% and 15%
in the SAD metric and MSE metric respectively, but also show stronger
generalization ability in other benchmarks.
Related papers
- Consistency Diffusion Bridge Models [25.213664260896103]
Diffusion bridge models (DDBMs) build processes between fixed data endpoints based on a reference diffusion process.
DDBMs' sampling process typically requires hundreds of network evaluations to achieve decent performance.
We propose two paradigms: consistency bridge distillation and consistency bridge training, which is flexible to apply on DDBMs with broad design choices.
arXiv Detail & Related papers (2024-10-30T02:04:23Z) - Immiscible Diffusion: Accelerating Diffusion Training with Noise Assignment [56.609042046176555]
suboptimal noise-data mapping leads to slow training of diffusion models.
Drawing inspiration from the immiscibility phenomenon in physics, we propose Immiscible Diffusion.
Our approach is remarkably simple, requiring only one line of code to restrict the diffuse-able area for each image.
arXiv Detail & Related papers (2024-06-18T06:20:42Z) - Deep Data Consistency: a Fast and Robust Diffusion Model-based Solver for Inverse Problems [0.0]
We propose Deep Data Consistency (DDC) to update the data consistency step with a deep learning model when solving inverse problems with diffusion models.
In comparison with state-of-the-art methods in linear and non-linear tasks, DDC demonstrates its outstanding performance of both similarity and realness metrics.
arXiv Detail & Related papers (2024-05-17T12:54:43Z) - Lancet: Accelerating Mixture-of-Experts Training via Whole Graph Computation-Communication Overlapping [14.435637320909663]
MoE technique plays crucial role in expanding the size of DNN model parameters.
Existing methods attempt to mitigate this issue by overlapping all-to-all with expert computation.
In our study, we extend the scope of this challenge by considering overlap at the broader training graph level.
We implement these techniques in Lancet, a system using compiler-based optimization to automatically enhance MoE model training.
arXiv Detail & Related papers (2024-04-30T10:17:21Z) - Provably Robust Score-Based Diffusion Posterior Sampling for Plug-and-Play Image Reconstruction [31.503662384666274]
In science and engineering, the goal is to infer an unknown image from a small number of measurements collected from a known forward model describing certain imaging modality.
Motivated Score-based diffusion models, due to its empirical success, have emerged as an impressive candidate of an exemplary prior in image reconstruction.
arXiv Detail & Related papers (2024-03-25T15:58:26Z) - The Surprising Effectiveness of Skip-Tuning in Diffusion Sampling [78.6155095947769]
Skip-Tuning is a simple yet surprisingly effective training-free tuning method on the skip connections.
Our method can achieve 100% FID improvement for pretrained EDM on ImageNet 64 with only 19 NFEs (1.75)
While Skip-Tuning increases the score-matching losses in the pixel space, the losses in the feature space are reduced.
arXiv Detail & Related papers (2024-02-23T08:05:23Z) - Improving Adversarial Robustness of Masked Autoencoders via Test-time
Frequency-domain Prompting [133.55037976429088]
We investigate the adversarial robustness of vision transformers equipped with BERT pretraining (e.g., BEiT, MAE)
A surprising observation is that MAE has significantly worse adversarial robustness than other BERT pretraining methods.
We propose a simple yet effective way to boost the adversarial robustness of MAE.
arXiv Detail & Related papers (2023-08-20T16:27:17Z) - Reflected Diffusion Models [93.26107023470979]
We present Reflected Diffusion Models, which reverse a reflected differential equation evolving on the support of the data.
Our approach learns the score function through a generalized score matching loss and extends key components of standard diffusion models.
arXiv Detail & Related papers (2023-04-10T17:54:38Z) - Scaling Multimodal Pre-Training via Cross-Modality Gradient
Harmonization [68.49738668084693]
Self-supervised pre-training recently demonstrates success on large-scale multimodal data.
Cross-modality alignment (CMA) is only a weak and noisy supervision.
CMA might cause conflicts and biases among modalities.
arXiv Detail & Related papers (2022-11-03T18:12:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.