Related papers: Safeguarding Data in Multimodal AI: A Differentially Private Approach to CLIP Training

Safeguarding Data in Multimodal AI: A Differentially Private Approach to CLIP Training

URL: http://arxiv.org/abs/2306.08173v2
Date: Fri, 1 Mar 2024 04:24:04 GMT
Title: Safeguarding Data in Multimodal AI: A Differentially Private Approach to CLIP Training
Authors: Alyssa Huang, Peihan Liu, Ryumei Nakada, Linjun Zhang, Wanrong Zhang
Abstract summary: We introduce a differentially private adaptation of the Contrastive Language-Image Pretraining (CLIP) model. Our proposed method, Dp-CLIP, is rigorously evaluated on benchmark datasets.
Score: 15.928338716118697
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The surge in multimodal AI's success has sparked concerns over data privacy in vision-and-language tasks. While CLIP has revolutionized multimodal learning through joint training on images and text, its potential to unintentionally disclose sensitive information necessitates the integration of privacy-preserving mechanisms. We introduce a differentially private adaptation of the Contrastive Language-Image Pretraining (CLIP) model that effectively addresses privacy concerns while retaining accuracy. Our proposed method, Dp-CLIP, is rigorously evaluated on benchmark datasets encompassing diverse vision-and-language tasks such as image classification and visual question answering. We demonstrate that our approach retains performance on par with the standard non-private CLIP model. Furthermore, we analyze our proposed algorithm under linear representation settings. We derive the convergence rate of our algorithm and show a trade-off between utility and privacy when gradients are clipped per-batch and the loss function does not satisfy smoothness conditions assumed in the literature for the analysis of DP-SGD.

Related papers

How Private is Your Attention? Bridging Privacy with In-Context Learning [4.093582834469802]
In-context learning (ICL)-the ability of transformer-based models to perform new tasks from examples provided at inference time-has emerged as a hallmark of modern language models. We propose a differentially private pretraining algorithm for linear attention heads and present the first theoretical analysis of the privacy-accuracy trade-off for ICL in linear regression.
arXiv Detail & Related papers (2025-04-22T16:05:26Z)
Adaptive Clipping for Privacy-Preserving Few-Shot Learning: Enhancing Generalization with Limited Data [12.614480013684759]
We introduce a novel approach called Meta-Clip to enhance the utility of privacy-preserving few-shot learning methods. By dynamically adjusting clipping thresholds during the training process, our Adaptive Clipping method provides fine-grained control over the disclosure of sensitive information. We demonstrate the effectiveness of our approach in minimizing utility degradation, showcasing a superior privacy-preserving trade-off compared to existing privacy-preserving techniques.
arXiv Detail & Related papers (2025-03-27T05:14:18Z)
Multi-Objective Optimization for Privacy-Utility Balance in Differentially Private Federated Learning [12.278668095136098]
Federated learning (FL) enables collaborative model training across distributed clients without sharing raw data. We propose an adaptive clipping mechanism that dynamically adjusts the clipping norm using a multi-objective optimization framework. Our results show that adaptive clipping consistently outperforms fixed-clipping baselines, achieving improved accuracy under the same privacy constraints.
arXiv Detail & Related papers (2025-03-27T04:57:05Z)
Federated Learning with Differential Privacy: An Utility-Enhanced Approach [12.614480013684759]
Federated learning has emerged as an attractive approach to protect data privacy by eliminating the need for sharing clients' data. Recent studies have shown that federated learning alone does not guarantee privacy, as private data may still be inferred from the uploaded parameters to the central server. We present a modification to these vanilla differentially private algorithms based on a Haar wavelet transformation step and a novel noise injection scheme that significantly lowers the bound of the noise variance.
arXiv Detail & Related papers (2025-03-27T04:48:29Z)
Differentially Private Active Learning: Balancing Effective Data Selection and Privacy [11.716423801223776]
We introduce differentially private active learning (DP-AL) for standard learning settings. We demonstrate that naively integrating DP-SGD training into AL presents substantial challenges in privacy budget allocation and data utilization. Our experiments on vision and natural language processing tasks show that DP-AL can improve performance for specific datasets and model architectures.
arXiv Detail & Related papers (2024-10-01T09:34:06Z)
Semantic Meta-Split Learning: A TinyML Scheme for Few-Shot Wireless Image Classification [50.28867343337997]
This work presents a TinyML-based semantic communication framework for few-shot wireless image classification. We exploit split-learning to limit the computations performed by the end-users while ensuring privacy-preserving. meta-learning overcomes data availability concerns and speeds up training by utilizing similarly trained tasks.
arXiv Detail & Related papers (2024-09-03T05:56:55Z)
Enhancing Few-shot CLIP with Semantic-Aware Fine-Tuning [61.902254546858465]
Methods based on Contrastive Language-Image Pre-training have exhibited promising performance in few-shot adaptation tasks. We propose fine-tuning the parameters of the attention pooling layer during the training process to encourage the model to focus on task-specific semantics.
arXiv Detail & Related papers (2023-11-08T05:18:57Z)
PrivacyMind: Large Language Models Can Be Contextual Privacy Protection Learners [81.571305826793]
We introduce Contextual Privacy Protection Language Models (PrivacyMind) Our work offers a theoretical analysis for model design and benchmarks various techniques. In particular, instruction tuning with both positive and negative examples stands out as a promising method.
arXiv Detail & Related papers (2023-10-03T22:37:01Z)
Privacy-Preserving In-Context Learning with Differentially Private Few-Shot Generation [37.55812121348268]
In-context learning (ICL) with large language models (LLMs) on private datasets poses privacy risks. We propose a novel algorithm that generates synthetic few-shot demonstrations from the private dataset with formal differential privacy guarantees.
arXiv Detail & Related papers (2023-09-21T03:59:00Z)
Independent Distribution Regularization for Private Graph Embedding [55.24441467292359]
Graph embeddings are susceptible to attribute inference attacks, which allow attackers to infer private node attributes from the learned graph embeddings. To address these concerns, privacy-preserving graph embedding methods have emerged. We propose a novel approach called Private Variational Graph AutoEncoders (PVGAE) with the aid of independent distribution penalty as a regularization term.
arXiv Detail & Related papers (2023-08-16T13:32:43Z)
Non-Contrastive Learning Meets Language-Image Pre-Training [145.6671909437841]
We study the validity of non-contrastive language-image pre-training (nCLIP) We introduce xCLIP, a multi-tasking framework combining CLIP and nCLIP, and show that nCLIP aids CLIP in enhancing feature semantics.
arXiv Detail & Related papers (2022-10-17T17:57:46Z)
Robust Cross-Modal Representation Learning with Progressive Self-Distillation [7.676408770854477]
The learning objective of vision-language approach of CLIP does not effectively account for the noisy many-to-many correspondences found in web-harvested image captioning datasets. We introduce a novel training framework based on cross-modal contrastive learning that uses progressive self-distillation and soft image-text alignments to more efficiently learn robust representations from noisy data.
arXiv Detail & Related papers (2022-04-10T03:28:18Z)
Continual Learning with Differential Privacy [19.186539487598385]
We introduce a notion of continual adjacent databases to bound the sensitivity of any data record participating in the training process of continual learning. We develop a new DP-preserving algorithm for CL with a data sampling strategy to quantify the privacy risk of training data. Our algorithm provides formal guarantees of privacy for data records across tasks in CL.
arXiv Detail & Related papers (2021-10-11T12:39:55Z)
Differentially private federated deep learning for multi-site medical image segmentation [56.30543374146002]
Collaborative machine learning techniques such as federated learning (FL) enable the training of models on effectively larger datasets without data transfer. Recent initiatives have demonstrated that segmentation models trained with FL can achieve performance similar to locally trained models. However, FL is not a fully privacy-preserving technique and privacy-centred attacks can disclose confidential patient data.
arXiv Detail & Related papers (2021-07-06T12:57:32Z)
SPEED: Secure, PrivatE, and Efficient Deep learning [2.283665431721732]
We introduce a deep learning framework able to deal with strong privacy constraints. Based on collaborative learning, differential privacy and homomorphic encryption, the proposed approach advances state-of-the-art.
arXiv Detail & Related papers (2020-06-16T19:31:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.