Related papers: StatWhy: Formal Verification Tool for Statistical Hypothesis Testing Programs

StatWhy: Formal Verification Tool for Statistical Hypothesis Testing Programs

URL: http://arxiv.org/abs/2405.17492v2
Date: Fri, 01 Nov 2024 16:16:45 GMT
Title: StatWhy: Formal Verification Tool for Statistical Hypothesis Testing Programs
Authors: Yusuke Kawamoto, Kentaro Kobayashi, Kohei Suenaga,
Abstract summary: We propose a new method for formally specifying and automatically verifying the correctness of statistical programs. programmers are required to annotate the source code of the statistical programs with the requirements for these methods. Our software tool StatWhy automatically checks whether programmers have properly specified the requirements for the statistical methods.
Score: 0.9886108751871757
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Statistical methods have been widely misused and misinterpreted in various scientific fields, raising significant concerns about the integrity of scientific research. To mitigate this problem, we propose a new method for formally specifying and automatically verifying the correctness of statistical programs. In this method, programmers are required to annotate the source code of the statistical programs with the requirements for these methods. Through this annotation, they are reminded to check the requirements for statistical methods, including those that cannot be formally verified, such as the distribution of the unknown true population. Our software tool StatWhy automatically checks whether programmers have properly specified the requirements for the statistical methods, thereby identifying any missing requirements that need to be addressed. This tool is implemented using the Why3 platform to verify the correctness of OCaml programs that conduct statistical hypothesis testing. We demonstrate how StatWhy can be used to avoid common errors in various popular statistical hypothesis testing programs.

Related papers

Statistically Valid Post-Deployment Monitoring Should Be Standard for AI-Based Digital Health [14.256683587576935]
Only 9% of FDA-registered AI-based healthcare tools include a post-deployment surveillance plan.<n> Existing monitoring approaches are often manual, sporadic, and reactive.<n>We propose that the detection of changes in the data and model performance degradation should be framed as distinct statistical hypothesis testing problems.
arXiv Detail & Related papers (2025-06-06T03:04:44Z)
SILENT: A New Lens on Statistics in Software Timing Side Channels [10.872605368135343]
Recent attacks have challenged our understanding of what it means for code to execute in constant time on modern CPUs. We introduce a new algorithm for the analysis of timing measurements with strong, formal statistical guarantees. We demonstrate the necessity, effectiveness, and benefits of our approach on both synthetic benchmarks and real-world applications.
arXiv Detail & Related papers (2025-04-28T14:22:23Z)
Distribution-Free Calibration of Statistical Confidence Sets [2.283561089098417]
We introduce two novel methods, TRUST and TRUST++, for calibrating confidence sets to achieve distribution-free conditional coverage. We demonstrate that our methods outperform existing approaches, particularly in small-sample regimes.
arXiv Detail & Related papers (2024-11-28T20:45:59Z)
Leveraging Machine Learning for Official Statistics: A Statistical Manifesto [0.0]
It is important for official statistics production to apply machine learning with statistical rigor. The Total Machine Learning Error (TMLE) is presented as a framework analogous to the Total Survey Error Model used in survey methodology.
arXiv Detail & Related papers (2024-09-06T15:57:25Z)
Treatment of Statistical Estimation Problems in Randomized Smoothing for Adversarial Robustness [0.0]
We review the statistical estimation problems for randomized smoothing to find out if the computational burden is necessary. We present estimation procedures employing confidence sequences enjoying the same statistical guarantees as the standard methods. We provide a randomized version of Clopper-Pearson confidence intervals resulting in strictly stronger certificates.
arXiv Detail & Related papers (2024-06-25T14:00:55Z)
Evaluating the Effectiveness of Index-Based Treatment Allocation [42.040099398176665]
When resources are scarce, an allocation policy is needed to decide who receives a resource. This paper introduces methods to evaluate index-based allocation policies using data from a randomized control trial.
arXiv Detail & Related papers (2024-02-19T01:55:55Z)
User-defined Event Sampling and Uncertainty Quantification in Diffusion Models for Physical Dynamical Systems [49.75149094527068]
We show that diffusion models can be adapted to make predictions and provide uncertainty quantification for chaotic dynamical systems. We develop a probabilistic approximation scheme for the conditional score function which converges to the true distribution as the noise level decreases. We are able to sample conditionally on nonlinear userdefined events at inference time, and matches data statistics even when sampling from the tails of the distribution.
arXiv Detail & Related papers (2023-06-13T03:42:03Z)
Applications of statistical causal inference in software engineering [2.969705152497174]
This paper reviews existing work in software engineering that applies statistical causal inference methods. Our results show that the application of statistical causal inference methods is relatively recent and that the corresponding research community remains relatively fragmented.
arXiv Detail & Related papers (2022-11-21T14:16:55Z)
Canary in a Coalmine: Better Membership Inference with Ensembled Adversarial Queries [53.222218035435006]
We use adversarial tools to optimize for queries that are discriminative and diverse. Our improvements achieve significantly more accurate membership inference than existing methods.
arXiv Detail & Related papers (2022-10-19T17:46:50Z)
Deep Feature Statistics Mapping for Generalized Screen Content Image Quality Assessment [60.88265569998563]
We make the first attempt to learn the statistics of SCIs, based upon which the quality of SCIs can be effectively determined. We empirically show that the statistics deviation could be effectively leveraged in quality assessment.
arXiv Detail & Related papers (2022-09-12T15:26:13Z)
Conformal prediction for the design problem [72.14982816083297]
In many real-world deployments of machine learning, we use a prediction algorithm to choose what data to test next. In such settings, there is a distinct type of distribution shift between the training and test data. We introduce a method to quantify predictive uncertainty in such settings.
arXiv Detail & Related papers (2022-02-08T02:59:12Z)
Online Bootstrap Inference For Policy Evaluation in Reinforcement Learning [90.59143158534849]
The recent emergence of reinforcement learning has created a demand for robust statistical inference methods. Existing methods for statistical inference in online learning are restricted to settings involving independently sampled observations. The online bootstrap is a flexible and efficient approach for statistical inference in linear approximation algorithms, but its efficacy in settings involving Markov noise has yet to be explored.
arXiv Detail & Related papers (2021-08-08T18:26:35Z)
SLOE: A Faster Method for Statistical Inference in High-Dimensional Logistic Regression [68.66245730450915]
We develop an improved method for debiasing predictions and estimating frequentist uncertainty for practical datasets. Our main contribution is SLOE, an estimator of the signal strength with convergence guarantees that reduces the computation time of estimation and inference by orders of magnitude.
arXiv Detail & Related papers (2021-03-23T17:48:56Z)
Efficient statistical validation with edge cases to evaluate Highly Automated Vehicles [6.198523595657983]
The widescale deployment of Autonomous Vehicles seems to be imminent despite many safety challenges that are yet to be resolved. Existing standards focus on deterministic processes where the validation requires only a set of test cases that cover the requirements. This paper presents a new approach to compute the statistical characteristics of a system's behaviour by biasing automatically generated test cases towards the worst case scenarios.
arXiv Detail & Related papers (2020-03-04T04:35:22Z)
The empirical duality gap of constrained statistical learning [115.23598260228587]
We study the study of constrained statistical learning problems, the unconstrained version of which are at the core of virtually all modern information processing. We propose to tackle the constrained statistical problem overcoming its infinite dimensionality, unknown distributions, and constraints by leveraging finite dimensional parameterizations, sample averages, and duality theory. We demonstrate the effectiveness and usefulness of this constrained formulation in a fair learning application.
arXiv Detail & Related papers (2020-02-12T19:12:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.