Support Estimation with Sampling Artifacts and Errors
- URL: http://arxiv.org/abs/2006.07999v1
- Date: Sun, 14 Jun 2020 19:57:17 GMT
- Title: Support Estimation with Sampling Artifacts and Errors
- Authors: Eli Chien, Olgica Milenkovic, Angelia Nedich
- Abstract summary: We introduce the first known approach to support estimation in the presence of sampling artifacts and errors.
The proposed estimator is based on regularized weighted Chebyshev approximations.
We observed significant improvements of our integrated methods compared to those obtained through adequate modifications of state-of-the-art noiseless support estimation methods.
- Score: 31.62490114774054
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The problem of estimating the support of a distribution is of great
importance in many areas of machine learning, computer science, physics and
biology. Most of the existing work in this domain has focused on settings that
assume perfectly accurate sampling approaches, which is seldom true in
practical data science. Here we introduce the first known approach to support
estimation in the presence of sampling artifacts and errors where each sample
is assumed to arise from a Poisson repeat channel which simultaneously captures
repetitions and deletions of samples. The proposed estimator is based on
regularized weighted Chebyshev approximations, with weights governed by
evaluations of so-called Touchard (Bell) polynomials. The supports in the
presence of sampling artifacts are calculated using discretized semi-infite
programming methods. The estimation approach is tested on synthetic and textual
data, as well as on GISAID data collected to address a new problem in
computational biology: mutational support estimation in genes of the SARS-Cov-2
virus. In the later setting, the Poisson channel captures the fact that many
individuals are tested multiple times for the presence of viral RNA, thereby
leading to repeated samples, while other individual's results are not recorded
due to test errors. For all experiments performed, we observed significant
improvements of our integrated methods compared to those obtained through
adequate modifications of state-of-the-art noiseless support estimation
methods.
Related papers
- DOTA: Distributional Test-Time Adaptation of Vision-Language Models [52.98590762456236]
Training-free test-time dynamic adapter (TDA) is a promising approach to address this issue.
We propose a simple yet effective method for DistributiOnal Test-time Adaptation (Dota)
Dota continually estimates the distributions of test samples, allowing the model to continually adapt to the deployment environment.
arXiv Detail & Related papers (2024-09-28T15:03:28Z) - A sparse PAC-Bayesian approach for high-dimensional quantile prediction [0.0]
This paper presents a novel probabilistic machine learning approach for high-dimensional quantile prediction.
It uses a pseudo-Bayesian framework with a scaled Student-t prior and Langevin Monte Carlo for efficient computation.
Its effectiveness is validated through simulations and real-world data, where it performs competitively against established frequentist and Bayesian techniques.
arXiv Detail & Related papers (2024-09-03T08:01:01Z) - Downstream-Pretext Domain Knowledge Traceback for Active Learning [138.02530777915362]
We propose a downstream-pretext domain knowledge traceback (DOKT) method that traces the data interactions of downstream knowledge and pre-training guidance.
DOKT consists of a traceback diversity indicator and a domain-based uncertainty estimator.
Experiments conducted on ten datasets show that our model outperforms other state-of-the-art methods.
arXiv Detail & Related papers (2024-07-20T01:34:13Z) - Multi-Source and Test-Time Domain Adaptation on Multivariate Signals using Spatio-Temporal Monge Alignment [59.75420353684495]
Machine learning applications on signals such as computer vision or biomedical data often face challenges due to the variability that exists across hardware devices or session recordings.
In this work, we propose Spatio-Temporal Monge Alignment (STMA) to mitigate these variabilities.
We show that STMA leads to significant and consistent performance gains between datasets acquired with very different settings.
arXiv Detail & Related papers (2024-07-19T13:33:38Z) - Multi-CATE: Multi-Accurate Conditional Average Treatment Effect Estimation Robust to Unknown Covariate Shifts [12.289361708127876]
We use methodology for learning multi-accurate predictors to post-process CATE T-learners.
We show how this approach can combine (large) confounded observational and (smaller) randomized datasets.
arXiv Detail & Related papers (2024-05-28T14:12:25Z) - Reducing the cost of posterior sampling in linear inverse problems via task-dependent score learning [5.340736751238338]
We show that the evaluation of the forward mapping can be entirely bypassed during posterior sample generation.
We prove that this observation generalizes to the framework of infinite-dimensional diffusion models introduced recently.
arXiv Detail & Related papers (2024-05-24T15:33:27Z) - Estimating Unknown Population Sizes Using the Hypergeometric Distribution [1.03590082373586]
We tackle the challenge of estimating discrete distributions when both the total population size and the sizes of its constituent categories are unknown.
We develop our approach to account for a data generating process where the ground-truth is a mixture of distributions conditional on a continuous latent variable.
Empirical data simulation demonstrates that our method outperforms other likelihood functions used to model count data.
arXiv Detail & Related papers (2024-02-22T01:53:56Z) - Tracking disease outbreaks from sparse data with Bayesian inference [55.82986443159948]
The COVID-19 pandemic provides new motivation for estimating the empirical rate of transmission during an outbreak.
Standard methods struggle to accommodate the partial observability and sparse data common at finer scales.
We propose a Bayesian framework which accommodates partial observability in a principled manner.
arXiv Detail & Related papers (2020-09-12T20:37:33Z) - Balance-Subsampled Stable Prediction [55.13512328954456]
We propose a novel balance-subsampled stable prediction (BSSP) algorithm based on the theory of fractional factorial design.
A design-theoretic analysis shows that the proposed method can reduce the confounding effects among predictors induced by the distribution shift.
Numerical experiments on both synthetic and real-world data sets demonstrate that our BSSP algorithm significantly outperforms the baseline methods for stable prediction across unknown test data.
arXiv Detail & Related papers (2020-06-08T07:01:38Z) - Efficiently Sampling Functions from Gaussian Process Posteriors [76.94808614373609]
We propose an easy-to-use and general-purpose approach for fast posterior sampling.
We demonstrate how decoupled sample paths accurately represent Gaussian process posteriors at a fraction of the usual cost.
arXiv Detail & Related papers (2020-02-21T14:03:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.