Do We Really Sample Right In Model-Based Diagnosis?
- URL: http://arxiv.org/abs/2009.12178v2
- Date: Thu, 4 Aug 2022 14:43:18 GMT
- Title: Do We Really Sample Right In Model-Based Diagnosis?
- Authors: Patrick Rodler and Fatima Elichanova
- Abstract summary: We study the representativeness of the produced samples in terms of their estimations about fault explanations.
We investigate the impact of sample size, the optimal trade-off between sampling efficiency and effectivity.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Statistical samples, in order to be representative, have to be drawn from a
population in a random and unbiased way. Nevertheless, it is common practice in
the field of model-based diagnosis to make estimations from (biased) best-first
samples. One example is the computation of a few most probable possible fault
explanations for a defective system and the use of these to assess which aspect
of the system, if measured, would bring the highest information gain.
In this work, we scrutinize whether these statistically not well-founded
conventions, that both diagnosis researchers and practitioners have adhered to
for decades, are indeed reasonable. To this end, we empirically analyze various
sampling methods that generate fault explanations. We study the
representativeness of the produced samples in terms of their estimations about
fault explanations and how well they guide diagnostic decisions, and we
investigate the impact of sample size, the optimal trade-off between sampling
efficiency and effectivity, and how approximate sampling techniques compare to
exact ones.
Related papers
- Detecting Adversarial Data by Probing Multiple Perturbations Using
Expected Perturbation Score [62.54911162109439]
Adversarial detection aims to determine whether a given sample is an adversarial one based on the discrepancy between natural and adversarial distributions.
We propose a new statistic called expected perturbation score (EPS), which is essentially the expected score of a sample after various perturbations.
We develop EPS-based maximum mean discrepancy (MMD) as a metric to measure the discrepancy between the test sample and natural samples.
arXiv Detail & Related papers (2023-05-25T13:14:58Z) - Evaluating Causal Inference Methods [0.4588028371034407]
We introduce a deep generative model-based framework, Credence, to validate causal inference methods.
Our work introduces a deep generative model-based framework, Credence, to validate causal inference methods.
arXiv Detail & Related papers (2022-02-09T00:21:22Z) - Sampling from Arbitrary Functions via PSD Models [55.41644538483948]
We take a two-step approach by first modeling the probability distribution and then sampling from that model.
We show that these models can approximate a large class of densities concisely using few evaluations, and present a simple algorithm to effectively sample from these models.
arXiv Detail & Related papers (2021-10-20T12:25:22Z) - The Pitfalls of Sample Selection: A Case Study on Lung Nodule
Classification [13.376247652484274]
In lung nodule classification, many works report results on the publicly available LIDC dataset. In theory, this should allow a direct comparison of the performance of proposed methods and assess the impact of individual contributions.
We find that each employs a different data selection process, leading to largely varying total number of samples and ratios between benign and malignant cases.
We show that specific choices can have severe impact on the data distribution where it may be possible to achieve superior performance on one sample distribution but not on another.
arXiv Detail & Related papers (2021-08-11T18:07:07Z) - Increasing the efficiency of randomized trial estimates via linear
adjustment for a prognostic score [59.75318183140857]
Estimating causal effects from randomized experiments is central to clinical research.
Most methods for historical borrowing achieve reductions in variance by sacrificing strict type-I error rate control.
arXiv Detail & Related papers (2020-12-17T21:10:10Z) - Tracking disease outbreaks from sparse data with Bayesian inference [55.82986443159948]
The COVID-19 pandemic provides new motivation for estimating the empirical rate of transmission during an outbreak.
Standard methods struggle to accommodate the partial observability and sparse data common at finer scales.
We propose a Bayesian framework which accommodates partial observability in a principled manner.
arXiv Detail & Related papers (2020-09-12T20:37:33Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z) - A Causal Direction Test for Heterogeneous Populations [10.653162005300608]
Most causal models assume a single homogeneous population, an assumption that may fail to hold in many applications.
We show that when the homogeneity assumption is violated, causal models developed based on such assumption can fail to identify the correct causal direction.
We propose an adjustment to a commonly used causal direction test statistic by using a $k$-means type clustering algorithm.
arXiv Detail & Related papers (2020-06-08T18:59:14Z) - Hemogram Data as a Tool for Decision-making in COVID-19 Management:
Applications to Resource Scarcity Scenarios [62.997667081978825]
COVID-19 pandemics has challenged emergency response systems worldwide, with widespread reports of essential services breakdown and collapse of health care structure.
This work describes a machine learning model derived from hemogram exam data performed in symptomatic patients.
Proposed models can predict COVID-19 qRT-PCR results in symptomatic individuals with high accuracy, sensitivity and specificity.
arXiv Detail & Related papers (2020-05-10T01:45:03Z) - Overly Optimistic Prediction Results on Imbalanced Data: a Case Study of
Flaws and Benefits when Applying Over-sampling [13.463035357173045]
We focus on one specific type of methodological flaw: applying over-sampling before partitioning the data into mutually exclusive training and testing sets.
We show how this causes the results to be biased using two artificial datasets and reproduce results of studies in which this flaw was identified.
arXiv Detail & Related papers (2020-01-15T12:53:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.