Towards Differential Relational Privacy and its use in Question
Answering
- URL: http://arxiv.org/abs/2203.16701v1
- Date: Wed, 30 Mar 2022 22:59:24 GMT
- Title: Towards Differential Relational Privacy and its use in Question
Answering
- Authors: Simone Bombari, Alessandro Achille, Zijian Wang, Yu-Xiang Wang,
Yusheng Xie, Kunwar Yashraj Singh, Srikar Appalaraju, Vijay Mahadevan,
Stefano Soatto
- Abstract summary: Memorization of relation between entities in a dataset can lead to privacy issues when using a trained question answering model.
We quantify this phenomenon and provide a possible definition of Differential Privacy (DPRP)
We illustrate concepts in experiments with largescale models for Question Answering.
- Score: 109.4452196071872
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Memorization of the relation between entities in a dataset can lead to
privacy issues when using a trained model for question answering. We introduce
Relational Memorization (RM) to understand, quantify and control this
phenomenon. While bounding general memorization can have detrimental effects on
the performance of a trained model, bounding RM does not prevent effective
learning. The difference is most pronounced when the data distribution is
long-tailed, with many queries having only few training examples: Impeding
general memorization prevents effective learning, while impeding only
relational memorization still allows learning general properties of the
underlying concepts. We formalize the notion of Relational Privacy (RP) and,
inspired by Differential Privacy (DP), we provide a possible definition of
Differential Relational Privacy (DrP). These notions can be used to describe
and compute bounds on the amount of RM in a trained model. We illustrate
Relational Privacy concepts in experiments with large-scale models for Question
Answering.
Related papers
- Rethinking LLM Memorization through the Lens of Adversarial Compression [93.13830893086681]
Large language models (LLMs) trained on web-scale datasets raise substantial concerns regarding permissible data usage.
One major question is whether these models "memorize" all their training data or they integrate many data sources in some way more akin to how a human would learn and synthesize information.
We propose the Adversarial Compression Ratio (ACR) as a metric for assessing memorization in LLMs.
arXiv Detail & Related papers (2024-04-23T15:49:37Z) - Unveiling Privacy, Memorization, and Input Curvature Links [11.290935303784208]
Memorization is closely related to several concepts such as generalization, noisy learning, and privacy.
Recent research has shown evidence linking input loss curvature (measured by the trace of the loss Hessian w.r.t inputs) and memorization.
We extend our analysis to establish theoretical links between differential privacy, memorization, and input loss curvature.
arXiv Detail & Related papers (2024-02-28T22:02:10Z) - SoK: Memorisation in machine learning [5.563171090433323]
Quantifying the impact of individual data samples on machine learning models is an open research problem.
In this work we unify a broad range of previous definitions and perspectives on memorisation in ML.
We discuss their interplay with model generalisation and their implications of these phenomena on data privacy.
arXiv Detail & Related papers (2023-11-06T12:59:18Z) - Improving Language Models Meaning Understanding and Consistency by
Learning Conceptual Roles from Dictionary [65.268245109828]
Non-human-like behaviour of contemporary pre-trained language models (PLMs) is a leading cause undermining their trustworthiness.
A striking phenomenon is the generation of inconsistent predictions, which produces contradictory results.
We propose a practical approach that alleviates the inconsistent behaviour issue by improving PLM awareness.
arXiv Detail & Related papers (2023-10-24T06:15:15Z) - Exploring Memorization in Fine-tuned Language Models [53.52403444655213]
We conduct the first comprehensive analysis to explore language models' memorization during fine-tuning across tasks.
Our studies with open-sourced and our own fine-tuned LMs across various tasks indicate that memorization presents a strong disparity among different fine-tuning tasks.
We provide an intuitive explanation of this task disparity via sparse coding theory and unveil a strong correlation between memorization and attention score distribution.
arXiv Detail & Related papers (2023-10-10T15:41:26Z) - Bounding Information Leakage in Machine Learning [26.64770573405079]
This paper investigates fundamental bounds on information leakage.
We identify and bound the success rate of the worst-case membership inference attack.
We derive bounds on the mutual information between the sensitive attributes and model parameters.
arXiv Detail & Related papers (2021-05-09T08:49:14Z) - Learning with Instance Bundles for Reading Comprehension [61.823444215188296]
We introduce new supervision techniques that compare question-answer scores across multiple related instances.
Specifically, we normalize these scores across various neighborhoods of closely contrasting questions and/or answers.
We empirically demonstrate the effectiveness of training with instance bundles on two datasets.
arXiv Detail & Related papers (2021-04-18T06:17:54Z) - Remembering for the Right Reasons: Explanations Reduce Catastrophic
Forgetting [100.75479161884935]
We propose a novel training paradigm called Remembering for the Right Reasons (RRR)
RRR stores visual model explanations for each example in the buffer and ensures the model has "the right reasons" for its predictions.
We demonstrate how RRR can be easily added to any memory or regularization-based approach and results in reduced forgetting.
arXiv Detail & Related papers (2020-10-04T10:05:27Z) - Understanding Unintended Memorization in Federated Learning [5.32880378510767]
We show that different components of Federated Learning play an important role in reducing unintended memorization.
We also show that training with a strong user-level differential privacy guarantee results in models that exhibit the least amount of unintended memorization.
arXiv Detail & Related papers (2020-06-12T22:10:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.