On mitigating stability-plasticity dilemma in CLIP-guided image morphing
via geodesic distillation loss
- URL: http://arxiv.org/abs/2401.10526v1
- Date: Fri, 19 Jan 2024 07:06:58 GMT
- Title: On mitigating stability-plasticity dilemma in CLIP-guided image morphing
via geodesic distillation loss
- Authors: Yeongtak Oh, Saehyung Lee, Uiwon Hwang, Sungroh Yoon
- Abstract summary: Large-scale language-vision pre-training models, such as CLIP, have achieved remarkable text-guided image morphing results.
Existing CLIP-guided image morphing methods encounter difficulties when morphing photorealistic images.
Our method achieves superior morphing results on both images and videos for various benchmarks, including CLIP-inversion.
- Score: 38.31276786740577
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large-scale language-vision pre-training models, such as CLIP, have achieved
remarkable text-guided image morphing results by leveraging several
unconditional generative models. However, existing CLIP-guided image morphing
methods encounter difficulties when morphing photorealistic images.
Specifically, existing guidance fails to provide detailed explanations of the
morphing regions within the image, leading to misguidance. In this paper, we
observed that such misguidance could be effectively mitigated by simply using a
proper regularization loss. Our approach comprises two key components: 1) a
geodesic cosine similarity loss that minimizes inter-modality features (i.e.,
image and text) on a projected subspace of CLIP space, and 2) a latent
regularization loss that minimizes intra-modality features (i.e., image and
image) on the image manifold. By replacing the na\"ive directional CLIP loss in
a drop-in replacement manner, our method achieves superior morphing results on
both images and videos for various benchmarks, including CLIP-inversion.
Related papers
- It's Not a Modality Gap: Characterizing and Addressing the Contrastive Gap [4.437949196235149]
A modality gap has been reported in two-encoder contrastive models like CLIP.
We show that even when accounting for all these factors, the contrastive loss actually creates a gap during training.
We present evidence that attributes this contrastive gap to low uniformity in CLIP space, resulting in embeddings that occupy only a small portion of the latent space.
arXiv Detail & Related papers (2024-05-28T20:28:07Z) - Bridging CLIP and StyleGAN through Latent Alignment for Image Editing [33.86698044813281]
We bridge CLIP and StyleGAN to achieve inference-time optimization-free diverse manipulation direction mining.
With this mapping scheme, we can achieve GAN inversion, text-to-image generation and text-driven image manipulation.
arXiv Detail & Related papers (2022-10-10T09:17:35Z) - Invertible Rescaling Network and Its Extensions [118.72015270085535]
In this work, we propose a novel invertible framework to model the bidirectional degradation and restoration from a new perspective.
We develop invertible models to generate valid degraded images and transform the distribution of lost contents.
Then restoration is made tractable by applying the inverse transformation on the generated degraded image together with a randomly-drawn latent variable.
arXiv Detail & Related papers (2022-10-09T06:58:58Z) - Gradient Variance Loss for Structure-Enhanced Image Super-Resolution [16.971608518924597]
We introduce a structure-enhancing loss function, coined Gradient Variance (GV) loss, and generate textures with perceptual-pleasant details.
Experimental results show that the GV loss can significantly improve both Structure Similarity (SSIM) and peak signal-to-noise ratio (PSNR) performance of existing image super-resolution (SR) deep learning models.
arXiv Detail & Related papers (2022-02-02T12:31:05Z) - Learning Discriminative Shrinkage Deep Networks for Image Deconvolution [122.79108159874426]
We propose an effective non-blind deconvolution approach by learning discriminative shrinkage functions to implicitly model these terms.
Experimental results show that the proposed method performs favorably against the state-of-the-art ones in terms of efficiency and accuracy.
arXiv Detail & Related papers (2021-11-27T12:12:57Z) - Spatially-Adaptive Image Restoration using Distortion-Guided Networks [51.89245800461537]
We present a learning-based solution for restoring images suffering from spatially-varying degradations.
We propose SPAIR, a network design that harnesses distortion-localization information and dynamically adjusts to difficult regions in the image.
arXiv Detail & Related papers (2021-08-19T11:02:25Z) - The Spatially-Correlative Loss for Various Image Translation Tasks [69.62228639870114]
We propose a novel spatially-correlative loss that is simple, efficient and yet effective for preserving scene structure consistency.
Previous methods attempt this by using pixel-level cycle-consistency or feature-level matching losses.
We show distinct improvement over baseline models in all three modes of unpaired I2I translation: single-modal, multi-modal, and even single-image translation.
arXiv Detail & Related papers (2021-04-02T02:13:30Z) - Contour Loss for Instance Segmentation via k-step Distance
Transformation Image [5.02853371403908]
Instance segmentation aims to locate targets in the image and segment each target area at pixel level.
Mask R-CNN is a classic method of instance segmentation, but its predicted masks are unclear and inaccurate near contours.
We propose a novel loss function, called contour loss, which can assure more accurate instance segmentation.
arXiv Detail & Related papers (2021-02-22T09:35:35Z) - Invertible Image Rescaling [118.2653765756915]
We develop an Invertible Rescaling Net (IRN) to produce visually-pleasing low-resolution images.
We capture the distribution of the lost information using a latent variable following a specified distribution in the downscaling process.
arXiv Detail & Related papers (2020-05-12T09:55:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.