Attesting Distributional Properties of Training Data for Machine Learning
- URL: http://arxiv.org/abs/2308.09552v4
- Date: Tue, 9 Apr 2024 11:41:25 GMT
- Title: Attesting Distributional Properties of Training Data for Machine Learning
- Authors: Vasisht Duddu, Anudeep Das, Nora Khayata, Hossein Yalame, Thomas Schneider, N. Asokan,
- Abstract summary: Several jurisdictions are preparing machine learning regulatory frameworks.
Draft regulations indicate that model trainers are required to show that training datasets have specific distributional properties.
We propose the notion of property attestation allowing a prover to demonstrate relevant distributional properties of training data to a verifier without revealing the data.
- Score: 15.2927830843089
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The success of machine learning (ML) has been accompanied by increased concerns about its trustworthiness. Several jurisdictions are preparing ML regulatory frameworks. One such concern is ensuring that model training data has desirable distributional properties for certain sensitive attributes. For example, draft regulations indicate that model trainers are required to show that training datasets have specific distributional properties, such as reflecting diversity of the population. We propose the notion of property attestation allowing a prover (e.g., model trainer) to demonstrate relevant distributional properties of training data to a verifier (e.g., a customer) without revealing the data. We present an effective hybrid property attestation combining property inference with cryptographic mechanisms.
Related papers
- Laminator: Verifiable ML Property Cards using Hardware-assisted Attestations [10.278905067763686]
A malicious model provider can include false information in ML property cards, raising a need for ML property cards.
We show how to realized them using property attestation, technical mechanisms by which a prover (e.g., a model provider) can attest different ML properties during training and inference to a verifier (e.g., an auditor)
arXiv Detail & Related papers (2024-06-25T13:36:53Z) - Prospector Heads: Generalized Feature Attribution for Large Models & Data [82.02696069543454]
We introduce prospector heads, an efficient and interpretable alternative to explanation-based attribution methods.
We demonstrate how prospector heads enable improved interpretation and discovery of class-specific patterns in input data.
arXiv Detail & Related papers (2024-02-18T23:01:28Z) - On the Connection between Pre-training Data Diversity and Fine-tuning
Robustness [66.30369048726145]
We find that the primary factor influencing downstream effective robustness is data quantity.
We demonstrate our findings on pre-training distributions drawn from various natural and synthetic data sources.
arXiv Detail & Related papers (2023-07-24T05:36:19Z) - Provable Robustness for Streaming Models with a Sliding Window [51.85182389861261]
In deep learning applications such as online content recommendation and stock market analysis, models use historical data to make predictions.
We derive robustness certificates for models that use a fixed-size sliding window over the input stream.
Our guarantees hold for the average model performance across the entire stream and are independent of stream size, making them suitable for large data streams.
arXiv Detail & Related papers (2023-03-28T21:02:35Z) - Data-SUITE: Data-centric identification of in-distribution incongruous
examples [81.21462458089142]
Data-SUITE is a data-centric framework to identify incongruous regions of in-distribution (ID) data.
We empirically validate Data-SUITE's performance and coverage guarantees.
arXiv Detail & Related papers (2022-02-17T18:58:31Z) - Dikaios: Privacy Auditing of Algorithmic Fairness via Attribute
Inference Attacks [0.5801044612920815]
We propose Dikaios, a privacy auditing tool for fairness algorithms for model builders.
We show that our attribute inference attacks with adaptive prediction threshold significantly outperform prior attacks.
arXiv Detail & Related papers (2022-02-04T17:19:59Z) - Toward Formal Data Set Verification for Building Effective Machine
Learning Models [2.707154152696381]
We present a formal approach for verifying a set of arbitrarily stated properties over a data set.
The proposed approach relies on the transformation of the data set into a first order logic formula.
A prototype tool, which uses the z3 solver, has been developed.
arXiv Detail & Related papers (2021-08-25T13:22:24Z) - Formalizing Distribution Inference Risks [11.650381752104298]
Property inference attacks are difficult to distinguish from the primary purposes of statistical machine learning.
We propose a formal and generic definition of property inference attacks.
arXiv Detail & Related papers (2021-06-07T15:10:06Z) - Proof-of-Learning: Definitions and Practice [15.585184189361486]
Training machine learning (ML) models typically involves expensive iterative optimization.
There is currently no mechanism for the entity which trained the model to prove that these parameters were indeed the result of this optimization procedure.
This paper introduces the concept of proof-of-learning in ML.
arXiv Detail & Related papers (2021-03-09T18:59:54Z) - Causal Feature Selection for Algorithmic Fairness [61.767399505764736]
We consider fairness in the integration component of data management.
We propose an approach to identify a sub-collection of features that ensure the fairness of the dataset.
arXiv Detail & Related papers (2020-06-10T20:20:10Z) - How Training Data Impacts Performance in Learning-based Control [67.7875109298865]
This paper derives an analytical relationship between the density of the training data and the control performance.
We formulate a quality measure for the data set, which we refer to as $rho$-gap.
We show how the $rho$-gap can be applied to a feedback linearizing control law.
arXiv Detail & Related papers (2020-05-25T12:13:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.