UniAud: A Unified Auditing Framework for High Auditing Power and Utility with One Training Run
- URL: http://arxiv.org/abs/2507.04457v1
- Date: Sun, 06 Jul 2025 16:35:48 GMT
- Title: UniAud: A Unified Auditing Framework for High Auditing Power and Utility with One Training Run
- Authors: Ruixuan Liu, Li Xiong,
- Abstract summary: We propose a unified framework, UniAud, for data-independent auditing.<n>We then extend this framework as UniAud++ for data-dependent auditing.<n>We show that our framework matches the state-of-the-art auditing results of O(T) auditing with thousands of runs.
- Score: 9.400936999321415
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Differentially private (DP) optimization has been widely adopted as a standard approach to provide rigorous privacy guarantees for training datasets. DP auditing verifies whether a model trained with DP optimization satisfies its claimed privacy level by estimating empirical privacy lower bounds through hypothesis testing. Recent O(1) frameworks improve auditing efficiency by checking the membership status of multiple audit samples in a single run, rather than checking individual samples across multiple runs. However, we reveal that there is no free lunch for this improved efficiency: data dependency and an implicit conflict between auditing and utility impair the tightness of the auditing results. Addressing these challenges, our key insights include reducing data dependency through uncorrelated data and resolving the auditing-utility conflict by decoupling the criteria for effective auditing and separating objectives for utility and auditing. We first propose a unified framework, UniAud, for data-independent auditing that maximizes auditing power through a novel uncorrelated canary construction and a self-comparison framework. We then extend this framework as UniAud++ for data-dependent auditing, optimizing the auditing and utility trade-off through multi-task learning with separate objectives for auditing and training. Experimental results validate that our black-box O(1) framework matches the state-of-the-art auditing results of O(T) auditing with thousands of runs, demonstrating the best efficiency-auditing trade-off across vision and language tasks. Additionally, our framework provides meaningful auditing with only slight utility degradation compared to standard DP training, showing the optimal utility-auditing trade-off and the benefit of requiring no extra training for auditing.
Related papers
- How Well Can Differential Privacy Be Audited in One Run? [2.687273760177295]
We characterize the maximum achievable efficacy of one-run auditing and show that the key barrier to its efficacy is interference between the observable effects of different data elements.<n>We present new conceptual approaches to minimize this barrier, towards improving the performance of one-run auditing of real machine learning algorithms.
arXiv Detail & Related papers (2025-03-10T11:32:30Z) - Access Denied: Meaningful Data Access for Quantitative Algorithm Audits [4.182284365432724]
Third-party audits are often hindered by access restrictions, forcing auditors to rely on limited, low-quality data.<n>We conduct audit simulations on two realistic case studies for recidivism and healthcare coverage prediction.<n>We find that data minimization and anonymization practices can strongly increase error rates on individual-level data, leading to unreliable assessments.
arXiv Detail & Related papers (2025-02-01T13:33:45Z) - Privacy Audit as Bits Transmission: (Im)possibilities for Audit by One Run [7.850976675388593]
We introduce a unifying framework for privacy audits based on information-theoretic principles.<n>We demystify the method of privacy audit by one run, identifying the conditions under which single-run audits are feasible or infeasible.
arXiv Detail & Related papers (2025-01-29T16:38:51Z) - A Systematic Examination of Preference Learning through the Lens of Instruction-Following [83.71180850955679]
We use a novel synthetic data generation pipeline to generate 48,000 instruction unique-following prompts.<n>With our synthetic prompts, we use two preference dataset curation methods - rejection sampling (RS) and Monte Carlo Tree Search (MCTS)<n>Experiments reveal that shared prefixes in preference pairs, as generated by MCTS, provide marginal but consistent improvements.<n>High-contrast preference pairs generally outperform low-contrast pairs; however, combining both often yields the best performance.
arXiv Detail & Related papers (2024-12-18T15:38:39Z) - The Decisive Power of Indecision: Low-Variance Risk-Limiting Audits and Election Contestation via Marginal Mark Recording [51.82772358241505]
Risk-limiting audits (RLAs) are techniques for verifying the outcomes of large elections.
We define new families of audits that improve efficiency and offer advances in statistical power.
New audits are enabled by revisiting the standard notion of a cast-vote record so that it can declare multiple possible mark interpretations.
arXiv Detail & Related papers (2024-02-09T16:23:54Z) - TrustFed: A Reliable Federated Learning Framework with Malicious-Attack
Resistance [8.924352407824566]
Federated learning (FL) enables collaborative learning among multiple clients while ensuring individual data privacy.
In this paper, we propose a hierarchical audit-based FL (HiAudit-FL) framework to enhance the reliability and security of the learning process.
Our simulation results demonstrate that HiAudit-FL can effectively identify and handle potential malicious users accurately, with small system overhead.
arXiv Detail & Related papers (2023-12-06T13:56:45Z) - Tight Auditing of Differentially Private Machine Learning [77.38590306275877]
For private machine learning, existing auditing mechanisms are tight.
They only give tight estimates under implausible worst-case assumptions.
We design an improved auditing scheme that yields tight privacy estimates for natural (not adversarially crafted) datasets.
arXiv Detail & Related papers (2023-02-15T21:40:33Z) - Socratic Pretraining: Question-Driven Pretraining for Controllable
Summarization [89.04537372465612]
Socratic pretraining is a question-driven, unsupervised pretraining objective designed to improve controllability in summarization tasks.
Our results show that Socratic pretraining cuts task-specific labeled data requirements in half.
arXiv Detail & Related papers (2022-12-20T17:27:10Z) - Sequential Kernelized Independence Testing [77.237958592189]
We design sequential kernelized independence tests inspired by kernelized dependence measures.<n>We demonstrate the power of our approaches on both simulated and real data.
arXiv Detail & Related papers (2022-12-14T18:08:42Z) - Listen, Adapt, Better WER: Source-free Single-utterance Test-time
Adaptation for Automatic Speech Recognition [65.84978547406753]
Test-time Adaptation aims to adapt the model trained on source domains to yield better predictions for test samples.
Single-Utterance Test-time Adaptation (SUTA) is the first TTA study in speech area to our best knowledge.
arXiv Detail & Related papers (2022-03-27T06:38:39Z) - Multi-view Contrastive Self-Supervised Learning of Accounting Data
Representations for Downstream Audit Tasks [1.9659095632676094]
International audit standards require the direct assessment of a financial statement's underlying accounting transactions, referred to as journal entries.
Deep learning inspired audit techniques have emerged in the field of auditing vast quantities of journal entry data.
We propose a contrastive self-supervised learning framework designed to learn audit task invariant accounting data representations.
arXiv Detail & Related papers (2021-09-23T08:16:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.