Related papers: On mitigating stability-plasticity dilemma in CLIP-guided image morphing via geodesic distillation loss

On mitigating stability-plasticity dilemma in CLIP-guided image morphing via geodesic distillation loss

URL: http://arxiv.org/abs/2401.10526v1
Date: Fri, 19 Jan 2024 07:06:58 GMT
Title: On mitigating stability-plasticity dilemma in CLIP-guided image morphing via geodesic distillation loss
Authors: Yeongtak Oh, Saehyung Lee, Uiwon Hwang, Sungroh Yoon
Abstract summary: Large-scale language-vision pre-training models, such as CLIP, have achieved remarkable text-guided image morphing results. Existing CLIP-guided image morphing methods encounter difficulties when morphing photorealistic images. Our method achieves superior morphing results on both images and videos for various benchmarks, including CLIP-inversion.
Score: 38.31276786740577
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large-scale language-vision pre-training models, such as CLIP, have achieved remarkable text-guided image morphing results by leveraging several unconditional generative models. However, existing CLIP-guided image morphing methods encounter difficulties when morphing photorealistic images. Specifically, existing guidance fails to provide detailed explanations of the morphing regions within the image, leading to misguidance. In this paper, we observed that such misguidance could be effectively mitigated by simply using a proper regularization loss. Our approach comprises two key components: 1) a geodesic cosine similarity loss that minimizes inter-modality features (i.e., image and text) on a projected subspace of CLIP space, and 2) a latent regularization loss that minimizes intra-modality features (i.e., image and image) on the image manifold. By replacing the na\"ive directional CLIP loss in a drop-in replacement manner, our method achieves superior morphing results on both images and videos for various benchmarks, including CLIP-inversion.

Related papers

Boosting Adversarial Transferability for Hyperspectral Image Classification Using 3D Structure-invariant Transformation and Intermediate Feature Distance [12.577452125758368]
Hyperspectral image (HSI) classification technologies based on Deep Neural Networks (DNNs) are vulnerable to adversarial attacks.<n>This paper proposes a novel method to enhance the transferability of the adversarial examples for HSI classification models.<n>The proposed method maintains robust attack performance even under defense strategies.
arXiv Detail & Related papers (2025-06-12T08:08:52Z)
Distill CLIP (DCLIP): Enhancing Image-Text Retrieval via Cross-Modal Transformer Distillation [4.063715077687089]
Distill CLIP (DCLIP) is a fine-tuned variant of the CLIP model.<n>It enhances multimodal image-text retrieval while preserving the original model's strong zero-shot classification capabilities.
arXiv Detail & Related papers (2025-05-25T07:08:07Z)
It's Not a Modality Gap: Characterizing and Addressing the Contrastive Gap [4.437949196235149]
A modality gap has been reported in two-encoder contrastive models like CLIP. We show that even when accounting for all these factors, the contrastive loss actually creates a gap during training. We present evidence that attributes this contrastive gap to low uniformity in CLIP space, resulting in embeddings that occupy only a small portion of the latent space.
arXiv Detail & Related papers (2024-05-28T20:28:07Z)
Bridging CLIP and StyleGAN through Latent Alignment for Image Editing [33.86698044813281]
We bridge CLIP and StyleGAN to achieve inference-time optimization-free diverse manipulation direction mining. With this mapping scheme, we can achieve GAN inversion, text-to-image generation and text-driven image manipulation.
arXiv Detail & Related papers (2022-10-10T09:17:35Z)
Invertible Rescaling Network and Its Extensions [118.72015270085535]
In this work, we propose a novel invertible framework to model the bidirectional degradation and restoration from a new perspective. We develop invertible models to generate valid degraded images and transform the distribution of lost contents. Then restoration is made tractable by applying the inverse transformation on the generated degraded image together with a randomly-drawn latent variable.
arXiv Detail & Related papers (2022-10-09T06:58:58Z)
Gradient Variance Loss for Structure-Enhanced Image Super-Resolution [16.971608518924597]
We introduce a structure-enhancing loss function, coined Gradient Variance (GV) loss, and generate textures with perceptual-pleasant details. Experimental results show that the GV loss can significantly improve both Structure Similarity (SSIM) and peak signal-to-noise ratio (PSNR) performance of existing image super-resolution (SR) deep learning models.
arXiv Detail & Related papers (2022-02-02T12:31:05Z)
Learning Discriminative Shrinkage Deep Networks for Image Deconvolution [122.79108159874426]
We propose an effective non-blind deconvolution approach by learning discriminative shrinkage functions to implicitly model these terms. Experimental results show that the proposed method performs favorably against the state-of-the-art ones in terms of efficiency and accuracy.
arXiv Detail & Related papers (2021-11-27T12:12:57Z)
Spatially-Adaptive Image Restoration using Distortion-Guided Networks [51.89245800461537]
We present a learning-based solution for restoring images suffering from spatially-varying degradations. We propose SPAIR, a network design that harnesses distortion-localization information and dynamically adjusts to difficult regions in the image.
arXiv Detail & Related papers (2021-08-19T11:02:25Z)
The Spatially-Correlative Loss for Various Image Translation Tasks [69.62228639870114]
We propose a novel spatially-correlative loss that is simple, efficient and yet effective for preserving scene structure consistency. Previous methods attempt this by using pixel-level cycle-consistency or feature-level matching losses. We show distinct improvement over baseline models in all three modes of unpaired I2I translation: single-modal, multi-modal, and even single-image translation.
arXiv Detail & Related papers (2021-04-02T02:13:30Z)
Contour Loss for Instance Segmentation via k-step Distance Transformation Image [5.02853371403908]
Instance segmentation aims to locate targets in the image and segment each target area at pixel level. Mask R-CNN is a classic method of instance segmentation, but its predicted masks are unclear and inaccurate near contours. We propose a novel loss function, called contour loss, which can assure more accurate instance segmentation.
arXiv Detail & Related papers (2021-02-22T09:35:35Z)
Invertible Image Rescaling [118.2653765756915]
We develop an Invertible Rescaling Net (IRN) to produce visually-pleasing low-resolution images. We capture the distribution of the lost information using a latent variable following a specified distribution in the downscaling process.
arXiv Detail & Related papers (2020-05-12T09:55:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.