Data Minimization at Inference Time
- URL: http://arxiv.org/abs/2305.17593v1
- Date: Sat, 27 May 2023 23:03:41 GMT
- Title: Data Minimization at Inference Time
- Authors: Cuong Tran and Ferdinando Fioretto
- Abstract summary: In domains with high stakes such as law, recruitment, and healthcare, learning models frequently rely on sensitive user data for inference.
This paper asks whether it is necessary to use emphall input features for accurate predictions at inference time.
The paper demonstrates that, in a personalized setting, individuals may only need to disclose a small subset of their features without compromising decision-making accuracy.
- Score: 44.15285550981899
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In domains with high stakes such as law, recruitment, and healthcare,
learning models frequently rely on sensitive user data for inference,
necessitating the complete set of features. This not only poses significant
privacy risks for individuals but also demands substantial human effort from
organizations to verify information accuracy. This paper asks whether it is
necessary to use \emph{all} input features for accurate predictions at
inference time. The paper demonstrates that, in a personalized setting,
individuals may only need to disclose a small subset of their features without
compromising decision-making accuracy. The paper also provides an efficient
sequential algorithm to determine the appropriate attributes for each
individual to provide. Evaluations across various learning tasks show that
individuals can potentially report as little as 10\% of their information while
maintaining the same accuracy level as a model that employs the full set of
user information.
Related papers
- Prompt-based Personality Profiling: Reinforcement Learning for Relevance Filtering [8.20929362102942]
Author profiling is the task of inferring characteristics about individuals by analyzing content they share.
We propose a new method for author profiling which aims at distinguishing relevant from irrelevant content first, followed by the actual user profiling only with relevant data.
We evaluate our method for Big Five personality trait prediction on two Twitter corpora.
arXiv Detail & Related papers (2024-09-06T08:43:10Z) - Personalized Privacy Auditing and Optimization at Test Time [44.15285550981899]
This paper asks whether it is necessary to require emphall input features for a model to return accurate predictions at test time.
Under a personalized setting, each individual may need to release only a small subset of these features without impacting the final decisions.
Evaluation over several learning tasks shows that individuals may be able to report as little as 10% of their information to ensure the same level of accuracy.
arXiv Detail & Related papers (2023-01-31T20:16:59Z) - Can Foundation Models Help Us Achieve Perfect Secrecy? [11.073539163281524]
A key promise of machine learning is the ability to assist users with personal tasks.
A gold standard privacy-preserving system will satisfy perfect secrecy.
However, privacy and quality appear to be in tension in existing systems for personal tasks.
arXiv Detail & Related papers (2022-05-27T02:32:26Z) - SF-PATE: Scalable, Fair, and Private Aggregation of Teacher Ensembles [50.90773979394264]
This paper studies a model that protects the privacy of individuals' sensitive information while also allowing it to learn non-discriminatory predictors.
A key characteristic of the proposed model is to enable the adoption of off-the-selves and non-private fair models to create a privacy-preserving and fair model.
arXiv Detail & Related papers (2022-04-11T14:42:54Z) - Algorithmic Fairness Datasets: the Story so Far [68.45921483094705]
Data-driven algorithms are studied in diverse domains to support critical decisions, directly impacting people's well-being.
A growing community of researchers has been investigating the equity of existing algorithms and proposing novel ones, advancing the understanding of risks and opportunities of automated decision-making for historically disadvantaged populations.
Progress in fair Machine Learning hinges on data, which can be appropriately used only if adequately documented.
Unfortunately, the algorithmic fairness community suffers from a collective data documentation debt caused by a lack of information on specific resources (opacity) and scatteredness of available information (sparsity)
arXiv Detail & Related papers (2022-02-03T17:25:46Z) - Towards a Data Privacy-Predictive Performance Trade-off [2.580765958706854]
We evaluate the existence of a trade-off between data privacy and predictive performance in classification tasks.
Unlike previous literature, we confirm that the higher the level of privacy, the higher the impact on predictive performance.
arXiv Detail & Related papers (2022-01-13T21:48:51Z) - Differentially Private and Fair Deep Learning: A Lagrangian Dual
Approach [54.32266555843765]
This paper studies a model that protects the privacy of the individuals sensitive information while also allowing it to learn non-discriminatory predictors.
The method relies on the notion of differential privacy and the use of Lagrangian duality to design neural networks that can accommodate fairness constraints.
arXiv Detail & Related papers (2020-09-26T10:50:33Z) - Differential Privacy of Hierarchical Census Data: An Optimization
Approach [53.29035917495491]
Census Bureaus are interested in releasing aggregate socio-economic data about a large population without revealing sensitive information about any individual.
Recent events have identified some of the privacy challenges faced by these organizations.
This paper presents a novel differential-privacy mechanism for releasing hierarchical counts of individuals.
arXiv Detail & Related papers (2020-06-28T18:19:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.