Towards Reliable and Generalizable Differentially Private Machine Learning (Extended Version)
- URL: http://arxiv.org/abs/2508.15141v1
- Date: Thu, 21 Aug 2025 00:27:06 GMT
- Title: Towards Reliable and Generalizable Differentially Private Machine Learning (Extended Version)
- Authors: Wenxuan Bao, Vincent Bindschaedler,
- Abstract summary: There is a flurry of recent research papers proposing novel differentially private machine learning (DPML) techniques.<n>These papers claim to achieve new state-of-the-art (SoTA) results and offer empirical results as validation.<n>There is no consensus on which techniques are most effective or if they genuinely meet their stated claims.
- Score: 7.223425966203561
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: There is a flurry of recent research papers proposing novel differentially private machine learning (DPML) techniques. These papers claim to achieve new state-of-the-art (SoTA) results and offer empirical results as validation. However, there is no consensus on which techniques are most effective or if they genuinely meet their stated claims. Complicating matters, heterogeneity in codebases, datasets, methodologies, and model architectures make direct comparisons of different approaches challenging. In this paper, we conduct a reproducibility and replicability (R+R) experiment on 11 different SoTA DPML techniques from the recent research literature. Results of our investigation are varied: while some methods stand up to scrutiny, others falter when tested outside their initial experimental conditions. We also discuss challenges unique to the reproducibility of DPML, including additional randomness due to DP noise, and how to address them. Finally, we derive insights and best practices to obtain scientifically valid and reliable results.
Related papers
- ExpVid: A Benchmark for Experiment Video Understanding & Reasoning [65.17173232816818]
We introduce ExpVid, the first benchmark designed to systematically evaluate MLLMs on scientific experiment videos.<n>We evaluate 19 leading MLLMs on ExpVid and find that while they excel at coarse-grained recognition, they struggle with disambiguating fine details, tracking state changes over time, and linking experimental procedures to scientific outcomes.<n>Our results reveal a notable performance gap between proprietary and open-source models, particularly in high-order reasoning.
arXiv Detail & Related papers (2025-10-13T16:45:28Z) - Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning [53.85659415230589]
This paper systematically reviews widely adoptedReinforcement learning techniques.<n>We present clear guidelines for selecting RL techniques tailored to specific setups.<n>We also reveal that a minimalist combination of two techniques can unlock the learning capability of critic-free policies.
arXiv Detail & Related papers (2025-08-11T17:39:45Z) - MOOSE-Chem3: Toward Experiment-Guided Hypothesis Ranking via Simulated Experimental Feedback [136.27567671480156]
We introduce experiment-guided ranking, which prioritizes hypotheses based on feedback from prior tests.<n>We frame experiment-guided ranking as a sequential decision-making problem.<n>Our approach significantly outperforms pre-experiment baselines and strong ablations.
arXiv Detail & Related papers (2025-05-23T13:24:50Z) - Continual Multimodal Contrastive Learning [70.60542106731813]
Multimodal contrastive learning (MCL) advances in aligning different modalities and generating multimodal representations in a joint space.<n>However, a critical yet often overlooked challenge remains: multimodal data is rarely collected in a single process, and training from scratch is computationally expensive.<n>In this paper, we formulate CMCL through two specialized principles of stability and plasticity.<n>We theoretically derive a novel optimization-based method, which projects updated gradients from dual sides onto subspaces where any gradient is prevented from interfering with the previously learned knowledge.
arXiv Detail & Related papers (2025-03-19T07:57:08Z) - Causal Lifting of Neural Representations: Zero-Shot Generalization for Causal Inferences [56.23412698865433]
We focus on Prediction-Powered Causal Inferences (PPCI)<n> PPCI estimates the treatment effect in a target experiment with unlabeled factual outcomes, retrievable zero-shot from a pre-trained model.<n>We validate our method on synthetic and real-world scientific data, offering solutions to instances not solvable by vanilla Empirical Risk Minimization.
arXiv Detail & Related papers (2025-02-10T10:52:17Z) - A Debate-Driven Experiment on LLM Hallucinations and Accuracy [7.821303946741665]
This study investigates the phenomenon of hallucination in large language models (LLMs)
Multiple instances of GPT-4o-Mini models engage in a debate-like interaction prompted with questions from the TruthfulQA dataset.
One model is deliberately instructed to generate plausible but false answers while the other models are asked to respond truthfully.
arXiv Detail & Related papers (2024-10-25T11:41:27Z) - SoK: Privacy-Preserving Data Synthesis [72.92263073534899]
This paper focuses on privacy-preserving data synthesis (PPDS) by providing a comprehensive overview, analysis, and discussion of the field.
We put forth a master recipe that unifies two prominent strands of research in PPDS: statistical methods and deep learning (DL)-based methods.
arXiv Detail & Related papers (2023-07-05T08:29:31Z) - DCID: Deep Canonical Information Decomposition [84.59396326810085]
We consider the problem of identifying the signal shared between two one-dimensional target variables.
We propose ICM, an evaluation metric which can be used in the presence of ground-truth labels.
We also propose Deep Canonical Information Decomposition (DCID) - a simple, yet effective approach for learning the shared variables.
arXiv Detail & Related papers (2023-06-27T16:59:06Z) - A quantitative study of NLP approaches to question difficulty estimation [0.30458514384586394]
This work quantitatively analyzes several approaches proposed in previous research, and comparing their performance on datasets from different educational domains.
We find that Transformer based models are the best performing across different educational domains, with DistilBERT performing almost as well as BERT.
As for the other models, the hybrid ones often outperform the ones based on a single type of features, the ones based on linguistic features perform well on reading comprehension questions, while frequency based features (TF-IDF) and word embeddings (word2vec) perform better in domain knowledge assessment.
arXiv Detail & Related papers (2023-05-17T14:26:00Z) - A Fair Experimental Comparison of Neural Network Architectures for
Latent Representations of Multi-Omics for Drug Response Prediction [7.690774882108066]
We train and optimize multi-omics integration methods under equal conditions.
We devised a novel method, Omics Stacking, that combines the advantages of intermediate and late integration.
Experiments were conducted on a public drug response data set with multiple omics data.
arXiv Detail & Related papers (2022-08-31T12:46:08Z) - A Multiple kernel testing procedure for non-proportional hazards in
factorial designs [4.358626952482687]
We propose a Multiple kernel testing procedure to infer survival data when several factors are of interest simultaneously.
Our method is able to deal with complex data and can be seen as an alternative to the omnipresent Cox model when assumptions such as proportionality cannot be justified.
arXiv Detail & Related papers (2022-06-15T01:53:49Z) - A reproducible experimental survey on biomedical sentence similarity: a
string-based method sets the state of the art [0.0]
This report introduces the largest, and for the first time, reproducible experimental survey on biomedical sentence similarity.
Our aim is to elucidate the state of the art of the problem and to solve some problems preventing the evaluation of most of current methods.
Our experiments confirm that the pre-processing stages, and the choice of the NER tool, have a significant impact on the performance of the sentence similarity methods.
arXiv Detail & Related papers (2022-05-18T06:20:42Z) - An Investigation of Replay-based Approaches for Continual Learning [79.0660895390689]
Continual learning (CL) is a major challenge of machine learning (ML) and describes the ability to learn several tasks sequentially without catastrophic forgetting (CF)
Several solution classes have been proposed, of which so-called replay-based approaches seem very promising due to their simplicity and robustness.
We empirically investigate replay-based approaches of continual learning and assess their potential for applications.
arXiv Detail & Related papers (2021-08-15T15:05:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.