Bayesian Quantile Matching Estimation
- URL: http://arxiv.org/abs/2008.06423v2
- Date: Mon, 11 Apr 2022 12:31:10 GMT
- Title: Bayesian Quantile Matching Estimation
- Authors: Rajbir-Singh Nirwan, Nils Bertschinger
- Abstract summary: Research and scientific understanding, e.g. for medical diagnostics or policy advice, often relies on data access.
We propose a Bayesian method for learning from quantile information.
- Score: 4.56877715768796
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Due to increased awareness of data protection and corresponding laws many
data, especially involving sensitive personal information, are not publicly
accessible. Accordingly, many data collecting agencies only release aggregated
data, e.g. providing the mean and selected quantiles of population
distributions. Yet, research and scientific understanding, e.g. for medical
diagnostics or policy advice, often relies on data access. To overcome this
tension, we propose a Bayesian method for learning from quantile information.
Being based on order statistics of finite samples our method adequately and
correctly reflects the uncertainty of empirical quantiles. After outlining the
theory, we apply our method to simulated as well as real world examples. In
addition, we provide a python-based package that implements the proposed model.
Related papers
- Privacy Preserving Data Imputation via Multi-party Computation for Medical Applications [1.7999333451993955]
This study addresses privacy-preserving imputation methods for sensitive data using secure multi-party computation.
We specifically target the medical and healthcare domains considering the significance of protection of the patient data.
Experiments on the diabetes dataset validated the correctness of our privacy-preserving imputation methods, yielding the largest error around $3 times 10-3$.
arXiv Detail & Related papers (2024-05-29T08:36:42Z) - Geometry-Aware Instrumental Variable Regression [56.16884466478886]
We propose a transport-based IV estimator that takes into account the geometry of the data manifold through data-derivative information.
We provide a simple plug-and-play implementation of our method that performs on par with related estimators in standard settings.
arXiv Detail & Related papers (2024-05-19T17:49:33Z) - Synthetic Census Data Generation via Multidimensional Multiset Sum [7.900694093691988]
We provide tools to generate synthetic microdata solely from published Census statistics.
We show that our methods work well in practice, and we offer theoretical arguments to explain our performance.
arXiv Detail & Related papers (2024-04-15T19:06:37Z) - Mean Estimation with User-level Privacy under Data Heterogeneity [54.07947274508013]
Different users may possess vastly different numbers of data points.
It cannot be assumed that all users sample from the same underlying distribution.
We propose a simple model of heterogeneous user data that allows user data to differ in both distribution and quantity of data.
arXiv Detail & Related papers (2023-07-28T23:02:39Z) - Beyond Normal: On the Evaluation of Mutual Information Estimators [52.85079110699378]
We show how to construct a diverse family of distributions with known ground-truth mutual information.
We provide guidelines for practitioners on how to select appropriate estimator adapted to the difficulty of problem considered.
arXiv Detail & Related papers (2023-06-19T17:26:34Z) - Membership Inference Attacks against Synthetic Data through Overfitting
Detection [84.02632160692995]
We argue for a realistic MIA setting that assumes the attacker has some knowledge of the underlying data distribution.
We propose DOMIAS, a density-based MIA model that aims to infer membership by targeting local overfitting of the generative model.
arXiv Detail & Related papers (2023-02-24T11:27:39Z) - A Survey of Learning on Small Data: Generalization, Optimization, and
Challenge [101.27154181792567]
Learning on small data that approximates the generalization ability of big data is one of the ultimate purposes of AI.
This survey follows the active sampling theory under a PAC framework to analyze the generalization error and label complexity of learning on small data.
Multiple data applications that may benefit from efficient small data representation are surveyed.
arXiv Detail & Related papers (2022-07-29T02:34:19Z) - Achieving Representative Data via Convex Hull Feasibility Sampling
Algorithms [35.29582673348303]
Sampling biases in training data are a major source of algorithmic biases in machine learning systems.
We present adaptive sampling methods to determine, with high confidence, whether it is possible to assemble a representative dataset from the given data sources.
arXiv Detail & Related papers (2022-04-13T23:14:05Z) - Qimera: Data-free Quantization with Synthetic Boundary Supporting
Samples [8.975667614727652]
We propose Qimera, a method that uses superposed latent embeddings to generate synthetic boundary supporting samples.
The experimental results show that Qimera achieves state-of-the-art performances for various settings on data-free quantization.
arXiv Detail & Related papers (2021-11-04T04:52:50Z) - Learning to Count in the Crowd from Limited Labeled Data [109.2954525909007]
We focus on reducing the annotation efforts by learning to count in the crowd from limited number of labeled samples.
Specifically, we propose a Gaussian Process-based iterative learning mechanism that involves estimation of pseudo-ground truth for the unlabeled data.
arXiv Detail & Related papers (2020-07-07T04:17:01Z) - Distributed Multivariate Regression Modeling For Selecting Biomarkers
Under Data Protection Constraints [0.0]
We propose a multivariable regression approach for identifying biomarkers by automatic variable selection based on aggregated data in iterative calls.
The approach can be used to jointly analyze data distributed across several locations.
In a simulation, the information loss introduced by local standardization is seen to be minimal.
arXiv Detail & Related papers (2018-03-01T15:04:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.