DeepInv: A Novel Self-supervised Learning Approach for Fast and Accurate Diffusion Inversion
- URL: http://arxiv.org/abs/2601.01487v1
- Date: Sun, 04 Jan 2026 11:27:26 GMT
- Title: DeepInv: A Novel Self-supervised Learning Approach for Fast and Accurate Diffusion Inversion
- Authors: Ziyue Zhang, Luxi Lin, Xiaolin Hu, Chao Chang, HuaiXi Wang, Yiyi Zhou, Rongrong Ji,
- Abstract summary: Diffusion inversion is a challenging task due to the lack of viable supervision signals.<n>We propose a novel self-supervised diffusion inversion approach, termed Deep Inversion (DeepInv)<n>DeepInv is also equipped with an iterative and multi-scale training regime to train a parameterized inversion solver.
- Score: 65.5172878666262
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Diffusion inversion is a task of recovering the noise of an image in a diffusion model, which is vital for controllable diffusion image editing. At present, diffusion inversion still remains a challenging task due to the lack of viable supervision signals. Thus, most existing methods resort to approximation-based solutions, which however are often at the cost of performance or efficiency. To remedy these shortcomings, we propose a novel self-supervised diffusion inversion approach in this paper, termed Deep Inversion (DeepInv). Instead of requiring ground-truth noise annotations, we introduce a self-supervised objective as well as a data augmentation strategy to generate high-quality pseudo noises from real images without manual intervention. Based on these two innovative designs, DeepInv is also equipped with an iterative and multi-scale training regime to train a parameterized inversion solver, thereby achieving the fast and accurate image-to-noise mapping. To the best of our knowledge, this is the first attempt of presenting a trainable solver to predict inversion noise step by step. The extensive experiments show that our DeepInv can achieve much better performance and inference speed than the compared methods, e.g., +40.435% SSIM than EasyInv and +9887.5% speed than ReNoise on COCO dataset. Moreover, our careful designs of trainable solvers can also provide insights to the community. Codes and model parameters will be released in https://github.com/potato-kitty/DeepInv.
Related papers
- Lightweight and Accurate Multi-View Stereo with Confidence-Aware Diffusion Model [81.01939699480094]
We propose a novel MVS framework, which introduces diffusion models in MVS.<n>Considering the discriminative characteristic of depth estimation, we design a condition encoder to guide the diffusion process.<n>Based on our novel MVS framework, we propose two novel MVS methods, DiffMVS and CasMVS.
arXiv Detail & Related papers (2025-09-18T17:59:19Z) - One-Step Diffusion Model for Image Motion-Deblurring [85.76149042561507]
We propose a one-step diffusion model for deblurring (OSDD), a novel framework that reduces the denoising process to a single step.<n>To tackle fidelity loss in diffusion models, we introduce an enhanced variational autoencoder (eVAE), which improves structural restoration.<n>Our method achieves strong performance on both full and no-reference metrics.
arXiv Detail & Related papers (2025-03-09T09:39:57Z) - Fast constrained sampling in pre-trained diffusion models [80.99262780028015]
We propose an algorithm that enables fast, high-quality generation under arbitrary constraints.<n>Our approach produces results that rival or surpass the state-of-the-art training-free inference methods.
arXiv Detail & Related papers (2024-10-24T14:52:38Z) - PrimeDepth: Efficient Monocular Depth Estimation with a Stable Diffusion Preimage [19.02295657801464]
This work addresses the task of zero-shot monocular depth estimation.
A recent advance in this field has been the idea of utilising Text-to-Image foundation models, such as Stable Diffusion.
We present PrimeDepth, a method that is highly efficient at test time while keeping, or even enhancing, the positive aspects of diffusion-based approaches.
arXiv Detail & Related papers (2024-09-13T19:03:48Z) - Blind Image Restoration via Fast Diffusion Inversion [17.139433082780037]
Blind Image Restoration via fast Diffusion (BIRD) is a blind IR method that jointly optimize for the degradation model parameters and the restored image.
A key idea in our method is not to modify the reverse sampling, i.e., not to alter all the intermediate latents, once an initial noise is sampled.
We experimentally validate BIRD on several image restoration tasks and show that it achieves state of the art performance on all of them.
arXiv Detail & Related papers (2024-05-29T23:38:12Z) - ReNoise: Real Image Inversion Through Iterative Noising [62.96073631599749]
We introduce an inversion method with a high quality-to-operation ratio, enhancing reconstruction accuracy without increasing the number of operations.
We evaluate the performance of our ReNoise technique using various sampling algorithms and models, including recent accelerated diffusion models.
arXiv Detail & Related papers (2024-03-21T17:52:08Z) - Lightning-Fast Image Inversion and Editing for Text-to-Image Diffusion Models [46.729930784279645]
We formulate the problem by finding the roots of an implicit equation and devlop a method to solve it efficiently.<n>Our solution is based on Newton-Raphson (NR), a well-known technique in numerical analysis.<n>We show improved results in image and generation of rare objects.
arXiv Detail & Related papers (2023-12-19T19:19:19Z) - Faster Diffusion: Rethinking the Role of the Encoder for Diffusion Model Inference [95.42299246592756]
We study the UNet encoder and empirically analyze the encoder features.
We find that encoder features change minimally, whereas the decoder features exhibit substantial variations across different time-steps.
We validate our approach on other tasks: text-to-video, personalized generation and reference-guided generation.
arXiv Detail & Related papers (2023-12-15T08:46:43Z) - Effective Real Image Editing with Accelerated Iterative Diffusion
Inversion [6.335245465042035]
It is still challenging to edit and manipulate natural images with modern generative models.
Existing approaches that have tackled the problem of inversion stability often incur in significant trade-offs in computational efficiency.
We propose an Accelerated Iterative Diffusion Inversion method, dubbed AIDI, that significantly improves reconstruction accuracy with minimal additional overhead in space and time complexity.
arXiv Detail & Related papers (2023-09-10T01:23:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.