Adaptive Data Analysis with Correlated Observations
- URL: http://arxiv.org/abs/2201.08704v1
- Date: Fri, 21 Jan 2022 14:00:30 GMT
- Title: Adaptive Data Analysis with Correlated Observations
- Authors: Aryeh Kontorovich, Menachem Sadigurschi, Uri Stemmer
- Abstract summary: We show that, in some cases, differential privacy guarantees even when there are dependencies within the sample.
We show that the connection between transcript-compression and adaptive data analysis can be extended to the non-iid setting.
- Score: 21.969356766737622
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The vast majority of the work on adaptive data analysis focuses on the case
where the samples in the dataset are independent. Several approaches and tools
have been successfully applied in this context, such as differential privacy,
max-information, compression arguments, and more. The situation is far less
well-understood without the independence assumption.
We embark on a systematic study of the possibilities of adaptive data
analysis with correlated observations. First, we show that, in some cases,
differential privacy guarantees generalization even when there are dependencies
within the sample, which we quantify using a notion we call Gibbs-dependence.
We complement this result with a tight negative example. Second, we show that
the connection between transcript-compression and adaptive data analysis can be
extended to the non-iid setting.
Related papers
- Stochastic Gradient Descent with Adaptive Data [4.119418481809095]
gradient descent (SGD) is a powerful optimization technique that is particularly useful in online learning scenarios.
Applying SGD to policy optimization problems in operations research involves a distinct challenge: the policy changes the environment and thereby affects the data used to update the policy.
The influence of previous decisions on the data generated introduces bias in the gradient estimate, which presents a potential source of instability for online learning not present in the iid case.
We show that the convergence speed of SGD with adaptive data is largely similar to the classical iid setting, as long as the mixing time of the policy-induced dynamics is factored in.
arXiv Detail & Related papers (2024-10-02T02:58:32Z) - DAGnosis: Localized Identification of Data Inconsistencies using
Structures [73.39285449012255]
Identification and appropriate handling of inconsistencies in data at deployment time is crucial to reliably use machine learning models.
We use directed acyclic graphs (DAGs) to encode the training set's features probability distribution and independencies as a structure.
Our method, called DAGnosis, leverages these structural interactions to bring valuable and insightful data-centric conclusions.
arXiv Detail & Related papers (2024-02-26T11:29:16Z) - Approximating Counterfactual Bounds while Fusing Observational, Biased
and Randomised Data Sources [64.96984404868411]
We address the problem of integrating data from multiple, possibly biased, observational and interventional studies.
We show that the likelihood of the available data has no local maxima.
We then show how the same approach can address the general case of multiple datasets.
arXiv Detail & Related papers (2023-07-31T11:28:24Z) - Reinforced Approximate Exploratory Data Analysis [7.974685452145769]
We are first to consider the impact of sampling in interactive data exploration settings as they introduce approximation errors.
We propose a Deep Reinforcement Learning (DRL) based framework which can optimize the sample selection in order to keep the analysis and insight generation flow intact.
arXiv Detail & Related papers (2022-12-12T20:20:22Z) - Utility Assessment of Synthetic Data Generation Methods [0.0]
We investigate whether different methods of generating fully synthetic data vary in their utility a priori.
We find some methods to perform better than others across the board.
We do get promising findings for classification tasks when using synthetic data for training machine learning models.
arXiv Detail & Related papers (2022-11-23T11:09:52Z) - Detecting hidden confounding in observational data using multiple
environments [0.81585306387285]
We present a theory for testable conditional independencies that are only absent when there is hidden confounding.
In most cases, the proposed procedure correctly predicts the presence of hidden confounding.
arXiv Detail & Related papers (2022-05-27T12:20:09Z) - Equivariance Allows Handling Multiple Nuisance Variables When Analyzing
Pooled Neuroimaging Datasets [53.34152466646884]
In this paper, we show how bringing recent results on equivariant representation learning instantiated on structured spaces together with simple use of classical results on causal inference provides an effective practical solution.
We demonstrate how our model allows dealing with more than one nuisance variable under some assumptions and can enable analysis of pooled scientific datasets in scenarios that would otherwise entail removing a large portion of the samples.
arXiv Detail & Related papers (2022-03-29T04:54:06Z) - Combining Observational and Randomized Data for Estimating Heterogeneous
Treatment Effects [82.20189909620899]
Estimating heterogeneous treatment effects is an important problem across many domains.
Currently, most existing works rely exclusively on observational data.
We propose to estimate heterogeneous treatment effects by combining large amounts of observational data and small amounts of randomized data.
arXiv Detail & Related papers (2022-02-25T18:59:54Z) - On Disentangled Representations Learned From Correlated Data [59.41587388303554]
We bridge the gap to real-world scenarios by analyzing the behavior of the most prominent disentanglement approaches on correlated data.
We show that systematically induced correlations in the dataset are being learned and reflected in the latent representations.
We also demonstrate how to resolve these latent correlations, either using weak supervision during training or by post-hoc correcting a pre-trained model with a small number of labels.
arXiv Detail & Related papers (2020-06-14T12:47:34Z) - On conditional versus marginal bias in multi-armed bandits [105.07190334523304]
The bias of the sample means of the arms in multi-armed bandits is an important issue in adaptive data analysis.
We characterize the sign of the conditional bias of monotone functions of the rewards, including the sample mean.
Our results hold for arbitrary conditioning events and leverage natural monotonicity properties of the data collection policy.
arXiv Detail & Related papers (2020-02-19T20:16:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.