Related papers: Differential Privacy of Cross-Attention with Provable Guarantee

Differential Privacy of Cross-Attention with Provable Guarantee

URL: http://arxiv.org/abs/2407.14717v2
Date: Tue, 15 Oct 2024 04:19:15 GMT
Title: Differential Privacy of Cross-Attention with Provable Guarantee
Authors: Yingyu Liang, Zhenmei Shi, Zhao Song, Yufa Zhou,
Abstract summary: We design a novel differential privacy (DP) data structure to address the privacy security of cross-attention with a theoretical guarantee. Our result is robust to adaptive queries in which users can intentionally attack the cross-attention system.
Score: 18.331374727331077
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Cross-attention has become a fundamental module nowadays in many important artificial intelligence applications, e.g., retrieval-augmented generation (RAG), system prompt, guided stable diffusion, and many more. Ensuring cross-attention privacy is crucial and urgently needed because its key and value matrices may contain sensitive information about model providers and their users. In this work, we design a novel differential privacy (DP) data structure to address the privacy security of cross-attention with a theoretical guarantee. In detail, let $n$ be the input token length of system prompt/RAG data, $d$ be the feature dimension, $0 < \alpha \le 1$ be the relative error parameter, $R$ be the maximum value of the query and key matrices, $R_w$ be the maximum value of the value matrix, and $r,s,\epsilon_s$ be parameters of polynomial kernel methods. Then, our data structure requires $\widetilde{O}(ndr^2)$ memory consumption with $\widetilde{O}(nr^2)$ initialization time complexity and $\widetilde{O}(\alpha^{-1} r^2)$ query time complexity for a single token query. In addition, our data structure can guarantee that the process of answering user query satisfies $(\epsilon, \delta)$-DP with $\widetilde{O}(n^{-1} \epsilon^{-1} \alpha^{-1/2} R^{2s} R_w r^2)$ additive error and $n^{-1} (\alpha + \epsilon_s)$ relative error between our output and the true answer. Furthermore, our result is robust to adaptive queries in which users can intentionally attack the cross-attention system. To our knowledge, this is the first work to provide DP for cross-attention and is promising to inspire more privacy algorithm design in large generative models (LGMs).

Related papers

Differentially Private Sparse Linear Regression with Heavy-tailed Responses [5.228567425731136]
This paper provides a comprehensive study of DP sparse linear regression with heavy-tailed responses in high-dimensional settings.<n>We introduce the DP-IHT-H method, which leverages the Huber loss and private iterative hard thresholding to achieve an estimation error bound of ( tildeObiggl( s* frac1 2 cdot biggl(fraclog dnbiggr)fraczeta1 + zeta1 + zeta2 + 2zeta
arXiv Detail & Related papers (2025-06-07T16:56:20Z)
HSR-Enhanced Sparse Attention Acceleration [19.776342074253435]
We introduce a novel approach to accelerate attention computation in Large Language Models (LLMs) We leverage the inherent sparsity within attention mechanisms, both in conventional Softmax attention and ReLU attention. Our method only introduces provably negligible error for Softmax attention.
arXiv Detail & Related papers (2024-10-14T05:18:02Z)
LevAttention: Time, Space, and Streaming Efficient Algorithm for Heavy Attentions [54.54897832889028]
We show that for any $K$, there is a universal set" $U subset [n]$ of size independent of $n$, such that for any $Q$ and any row $i$, the large attention scores $A_i,j$ in row $i$ of $A$ all have $jin U$. We empirically show the benefits of our scheme for vision transformers, showing how to train new models that use our universal set while training as well.
arXiv Detail & Related papers (2024-10-07T19:47:13Z)
Federated Combinatorial Multi-Agent Multi-Armed Bandits [79.1700188160944]
This paper introduces a federated learning framework tailored for online optimization with bandit. In this setting, agents subsets of arms, observe noisy rewards for these subsets without accessing individual arm information, and can cooperate and share information at specific intervals.
arXiv Detail & Related papers (2024-05-09T17:40:09Z)
Private Vector Mean Estimation in the Shuffle Model: Optimal Rates Require Many Messages [63.366380571397]
We study the problem of private vector mean estimation in the shuffle model of privacy where $n$ users each have a unit vector $v(i) inmathbbRd$. We propose a new multi-message protocol that achieves the optimal error using $tildemathcalOleft(min(nvarepsilon2,d)right)$ messages per user.
arXiv Detail & Related papers (2024-04-16T00:56:36Z)
One Pass Streaming Algorithm for Super Long Token Attention Approximation in Sublinear Space [11.735802740426294]
Attention computation takes both the time complexity of $O(n2)$ and the space complexity of $O(n2)$ simultaneously. We introduce a new algorithm that only reads one pass of data in a streaming fashion. Notably, our algorithm exhibits exceptional memory-efficient performance with super-long tokens.
arXiv Detail & Related papers (2023-11-24T18:35:00Z)
Robust Approximation Algorithms for Non-monotone $k$-Submodular Maximization under a Knapsack Constraint [0.0]
Two deterministic approximation algorithms are presented for the problem of non-monotone $k$-submodular complexity under a knapsack constraint. Our algorithms provide constant approximation ratios within only $O(nk)$ query complexity for the non-monotone objective.
arXiv Detail & Related papers (2023-09-21T12:42:52Z)
Adaptive and Dynamic Multi-Resolution Hashing for Pairwise Summations [19.602149096819776]
We propose Adam-Hash: an adaptive and dynamic multi-resolution hashing data-structure for fast pairwise summation estimation. Our proposed Adam-Hash is also robust to adaptive PSE queries, where an adversary can choose query $q_j in mathbbRd$ depending on the output from previous queries.
arXiv Detail & Related papers (2022-12-21T23:23:24Z)
Reward-Mixing MDPs with a Few Latent Contexts are Learnable [75.17357040707347]
We consider episodic reinforcement learning in reward-mixing Markov decision processes (RMMDPs) Our goal is to learn a near-optimal policy that nearly maximizes the $H$ time-step cumulative rewards in such a model.
arXiv Detail & Related papers (2022-10-05T22:52:00Z)
Optimal Query Complexities for Dynamic Trace Estimation [59.032228008383484]
We consider the problem of minimizing the number of matrix-vector queries needed for accurate trace estimation in the dynamic setting where our underlying matrix is changing slowly. We provide a novel binary tree summation procedure that simultaneously estimates all $m$ traces up to $epsilon$ error with $delta$ failure probability. Our lower bounds (1) give the first tight bounds for Hutchinson's estimator in the matrix-vector product model with Frobenius norm error even in the static setting, and (2) are the first unconditional lower bounds for dynamic trace estimation.
arXiv Detail & Related papers (2022-09-30T04:15:44Z)
DP-PCA: Statistically Optimal and Differentially Private PCA [44.22319983246645]
DP-PCA is a single-pass algorithm that overcomes both limitations. For sub-Gaussian data, we provide nearly optimal statistical error rates even for $n=tilde O(d)$.
arXiv Detail & Related papers (2022-05-27T02:02:17Z)
Computational Complexity of Normalizing Constants for the Product of Determinantal Point Processes [12.640283469603357]
We study the computational complexity of computing the normalizing constant. We show that $sum_Sdet(bf A_S,S)p$ exactly for every (fixed) positive even integer $p$ is UP-hard and Mod$_3$P-hard.
arXiv Detail & Related papers (2021-11-28T14:08:25Z)
Semi-supervised Active Regression [21.51757844385258]
This paper studies the use of partially labelled, potentially biased data for learning tasks. The learner has access to a dataset $X in mathbbRd | X min_beta in mathbbRd | X beta - Y |2 end2 equation while making few additional label queries.
arXiv Detail & Related papers (2021-06-12T03:28:43Z)
On Avoiding the Union Bound When Answering Multiple Differentially Private Queries [49.453751858361265]
We give an algorithm for this task that achieves an expected $ell_infty$ error bound of $O(frac1epsilonsqrtk log frac1delta)$. On the other hand, the algorithm of Dagan and Kur has a remarkable advantage that the $ell_infty$ error bound of $O(frac1epsilonsqrtk log frac1delta)$ holds not only in expectation but always.
arXiv Detail & Related papers (2020-12-16T17:58:45Z)
List-Decodable Mean Estimation in Nearly-PCA Time [50.79691056481693]
We study the fundamental task of list-decodable mean estimation in high dimensions. Our algorithm runs in time $widetildeO(ndk)$ for all $k = O(sqrtd) cup Omega(d)$, where $n$ is the size of the dataset. A variant of our algorithm has runtime $widetildeO(ndk)$ for all $k$, at the expense of an $O(sqrtlog k)$ factor in the recovery guarantee
arXiv Detail & Related papers (2020-11-19T17:21:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.