Related papers: MIP: CLIP-based Image Reconstruction from PEFT Gradients

MIP: CLIP-based Image Reconstruction from PEFT Gradients

URL: http://arxiv.org/abs/2403.07901v1
Date: Mon, 26 Feb 2024 02:19:01 GMT
Title: MIP: CLIP-based Image Reconstruction from PEFT Gradients
Authors: Peiheng Zhou, Ming Hu, Xiaofei Xie, Yihao Huang, Kangjie Chen, Mingsong Chen,
Abstract summary: We propose a proprietary reconstruction attack method targeting CLIP-based distributed machine learning architecture. Specifically, MIP can reconstruct CLIP training images according to the gradients of soft prompts or an adapter. Experimental results show that MIP can effectively reconstruct training images according to the gradients of soft prompts or adapters of CLIP models.
Score: 25.41543057104711
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Contrastive Language-Image Pre-training (CLIP) model, as an effective pre-trained multimodal neural network, has been widely used in distributed machine learning tasks, especially Federated Learning (FL). Typically, CLIP-based FL adopts Parameter-Efficient Fine-Tuning (PEFT) for model training, which only fine-tunes adapter parameters or soft prompts rather than the full parameters. Although PEFT is different from the traditional training mode, in this paper, we theoretically analyze that the gradients of adapters or soft prompts can still be used to perform image reconstruction attacks. Based on our theoretical analysis, we propose Multm-In-Parvo (MIP), a proprietary reconstruction attack method targeting CLIP-based distributed machine learning architecture. Specifically, MIP can reconstruct CLIP training images according to the gradients of soft prompts or an adapter. In addition, MIP includes a label prediction strategy to accelerate convergence and an inverse gradient estimation mechanism to avoid the vanishing gradient problem on the text encoder. Experimental results show that MIP can effectively reconstruct training images according to the gradients of soft prompts or adapters of CLIP models.

Related papers

CLIP-Map: Structured Matrix Mapping for Parameter-Efficient CLIP Compression [70.45437536012015]
Contrastive Language-Image Pre-training (CLIP) has achieved widely applications in computer vision tasks.<n>CLIP suffers from high memory and computation cost, which prohibits its usage to the resource-limited application scenarios.<n>We propose a novel mapping-based CLIP compression framework, CLIP-Map.
arXiv Detail & Related papers (2026-02-05T17:25:16Z)
HistoSpeckle-Net: Mutual Information-Guided Deep Learning for high-fidelity reconstruction of complex OrganAMNIST images via perturbed Multimode Fibers [0.0]
HistoSpeckle-Net is a deep learning architecture designed to reconstruct structurally rich medical images from MMF speckles.<n>Our experiments on the complex OrganAMNIST dataset demonstrate that HistoSpeckle-Net achieves higher fidelity than baseline models.
arXiv Detail & Related papers (2025-11-25T12:20:50Z)
In-Context Learning for Gradient-Free Receiver Adaptation: Principles, Applications, and Theory [54.92893355284945]
Deep learning-based wireless receivers offer the potential to dynamically adapt to varying channel environments.<n>Current adaptation strategies, including joint training, hypernetwork-based methods, and meta-learning, either demonstrate limited flexibility or necessitate explicit optimization through gradient descent.<n>This paper presents gradient-free adaptation techniques rooted in the emerging paradigm of in-context learning (ICL)
arXiv Detail & Related papers (2025-06-18T06:43:55Z)
Implicit Inversion turns CLIP into a Decoder [15.428694454730541]
We show that image synthesis is possible using CLIP alone -- without any decoder, training, or fine-tuning.<n>Our approach optimize a frequency-aware implicit neural representation that encourages coarse-to-fine generation by stratifying across network layers.<n>Without altering CLIP's weights, this framework unlocks capabilities such as text-to-image generation, style transfer, and image reconstruction.
arXiv Detail & Related papers (2025-05-29T06:55:26Z)
DeeCLIP: A Robust and Generalizable Transformer-Based Framework for Detecting AI-Generated Images [14.448350657613368]
DeeCLIP is a novel framework for detecting AI-generated images. It incorporates DeeFuser, a fusion module that combines high-level and low-level features. We trained exclusively on 4-class ProGAN data, DeeCLIP achieves an average accuracy of 89.90%.
arXiv Detail & Related papers (2025-04-28T15:06:28Z)
ELIP: Enhanced Visual-Language Foundation Models for Image Retrieval [83.01358520910533]
We introduce a new framework that can boost the performance of large-scale pre-trained vision- curation models. The approach, Enhanced Language-Image Pre-training (ELIP), uses the text query, via a simple mapping network, to predict a set of visual prompts. ELIP can easily be applied to the commonly used CLIP, SigLIP and BLIP-2 networks.
arXiv Detail & Related papers (2025-02-21T18:59:57Z)
CALLIC: Content Adaptive Learning for Lossless Image Compression [64.47244912937204]
CALLIC sets a new state-of-the-art (SOTA) for learned lossless image compression. We propose a content-aware autoregressive self-attention mechanism by leveraging convolutional gating operations. During encoding, we decompose pre-trained layers, including depth-wise convolutions, using low-rank matrices and then adapt the incremental weights on testing image by Rate-guided Progressive Fine-Tuning (RPFT) RPFT fine-tunes with gradually increasing patches that are sorted in descending order by estimated entropy, optimizing learning process and reducing adaptation time.
arXiv Detail & Related papers (2024-12-23T10:41:18Z)
CLIP-MoE: Towards Building Mixture of Experts for CLIP with Diversified Multiplet Upcycling [21.734200158914476]
Contrastive Language-Image Pre-training (CLIP) has become a cornerstone in multimodal intelligence. DMU efficiently fine-tunes a series of CLIP models that capture different feature spaces. Experiments demonstrate the significant performance of CLIP-MoE across various zero-shot retrieval, zero-shot image classification tasks.
arXiv Detail & Related papers (2024-09-28T09:28:51Z)
Cross-Scan Mamba with Masked Training for Robust Spectral Imaging [51.557804095896174]
We propose the Cross-Scanning Mamba, named CS-Mamba, that employs a Spatial-Spectral SSM for global-local balanced context encoding. Experiment results show that our CS-Mamba achieves state-of-the-art performance and the masked training method can better reconstruct smooth features to improve the visual quality.
arXiv Detail & Related papers (2024-08-01T15:14:10Z)
MoPE-CLIP: Structured Pruning for Efficient Vision-Language Models with Module-wise Pruning Error Metric [57.3330687266266]
We find that using smaller pre-trained models and applying magnitude-based pruning on CLIP models leads to inflexibility and inferior performance. Using the Module-wise Pruning Error (MoPE) metric, we introduce a unified pruning framework applicable to both pre-training and task-specific fine-tuning compression stages.
arXiv Detail & Related papers (2024-03-12T17:24:26Z)
CLIP Guided Image-perceptive Prompt Learning for Image Enhancement [15.40368082025006]
Contrastive Language-Image Pre-Training (CLIP) Guided Prompt Learning is proposed. We learn image-perceptive prompts to distinguish between original and target images using CLIP model. We introduce a very simple network by incorporating a simple baseline to predict the weights of three different LUT as enhancement network.
arXiv Detail & Related papers (2023-11-07T12:36:20Z)
Imaging through multimode fibres with physical prior [3.174639607243348]
We propose a physics-assisted, unsupervised, learning-based fibre imaging scheme. The reconstruction process of the online learning only requires a few speckle patterns and unpaired targets. Our scheme has the potential to extend the application of multimode fibre imaging.
arXiv Detail & Related papers (2023-11-06T12:46:29Z)
A Structured Pruning Algorithm for Model-based Deep Learning [8.09765408941809]
We present structured pruning algorithm for model-based deep learning (SPADE) as the first structured pruning algorithm for MBDL networks. We propose three distinct strategies to fine-tune the pruned MBDL networks to minimize the performance loss. Our results highlight that MBDL models pruned by SPADE can achieve substantial speed up in testing time while maintaining competitive performance.
arXiv Detail & Related papers (2023-11-03T16:05:51Z)
Iterative Soft Shrinkage Learning for Efficient Image Super-Resolution [91.3781512926942]
Image super-resolution (SR) has witnessed extensive neural network designs from CNN to transformer architectures. This work investigates the potential of network pruning for super-resolution iteration to take advantage of off-the-shelf network designs and reduce the underlying computational overhead. We propose a novel Iterative Soft Shrinkage-Percentage (ISS-P) method by optimizing the sparse structure of a randomly network at each and tweaking unimportant weights with a small amount proportional to the magnitude scale on-the-fly.
arXiv Detail & Related papers (2023-03-16T21:06:13Z)
Scaling Forward Gradient With Local Losses [117.22685584919756]
Forward learning is a biologically plausible alternative to backprop for learning deep neural networks. We show that it is possible to substantially reduce the variance of the forward gradient by applying perturbations to activations rather than weights. Our approach matches backprop on MNIST and CIFAR-10 and significantly outperforms previously proposed backprop-free algorithms on ImageNet.
arXiv Detail & Related papers (2022-10-07T03:52:27Z)
Multi-layer Clustering-based Residual Sparsifying Transform for Low-dose CT Image Reconstruction [11.011268090482575]
We propose a network-structured sparsifying transform learning approach for X-ray computed tomography (CT) reconstruction. We apply the MCST model to low-dose CT reconstruction by deploying the learned MCST model into the regularizer in penalized weighted least squares (PWLS) reconstruction. Our simulation results demonstrate that PWLS-MCST achieves better image reconstruction quality than the conventional FBP method and PWLS with edge-preserving (EP) regularizer.
arXiv Detail & Related papers (2022-03-22T09:38:41Z)
Perceptually Optimizing Deep Image Compression [53.705543593594285]
Mean squared error (MSE) and $ell_p$ norms have largely dominated the measurement of loss in neural networks. We propose a different proxy approach to optimize image analysis networks against quantitative perceptual models.
arXiv Detail & Related papers (2020-07-03T14:33:28Z)
Predictive Coding Approximates Backprop along Arbitrary Computation Graphs [68.8204255655161]
We develop a strategy to translate core machine learning architectures into their predictive coding equivalents. Our models perform equivalently to backprop on challenging machine learning benchmarks. Our method raises the potential that standard machine learning algorithms could in principle be directly implemented in neural circuitry.
arXiv Detail & Related papers (2020-06-07T15:35:47Z)
BP-DIP: A Backprojection based Deep Image Prior [49.375539602228415]
We propose two image restoration approaches: (i) Deep Image Prior (DIP), which trains a convolutional neural network (CNN) from scratch in test time using the degraded image; and (ii) a backprojection (BP) fidelity term, which is an alternative to the standard least squares loss that is usually used in previous DIP works. We demonstrate the performance of the proposed method, termed BP-DIP, on the deblurring task and show its advantages over the plain DIP, with both higher PSNR values and better inference run-time.
arXiv Detail & Related papers (2020-03-11T17:09:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.