Rigorous Assessment of Model Inference Accuracy using Language
Cardinality
- URL: http://arxiv.org/abs/2211.16587v3
- Date: Tue, 16 Jan 2024 17:33:56 GMT
- Title: Rigorous Assessment of Model Inference Accuracy using Language
Cardinality
- Authors: Donato Clun, Donghwan Shin, Antonio Filieri, Domenico Bianculli
- Abstract summary: We develop a systematic approach that minimizes bias and uncertainty in model accuracy assessment by replacing statistical estimation with deterministic accuracy measures.
We experimentally demonstrate the consistency and applicability of our approach by assessing the accuracy of models inferred by state-of-the-art inference tools.
- Score: 5.584832154027001
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Models such as finite state automata are widely used to abstract the behavior
of software systems by capturing the sequences of events observable during
their execution. Nevertheless, models rarely exist in practice and, when they
do, get easily outdated; moreover, manually building and maintaining models is
costly and error-prone. As a result, a variety of model inference methods that
automatically construct models from execution traces have been proposed to
address these issues.
However, performing a systematic and reliable accuracy assessment of inferred
models remains an open problem. Even when a reference model is given, most
existing model accuracy assessment methods may return misleading and biased
results. This is mainly due to their reliance on statistical estimators over a
finite number of randomly generated traces, introducing avoidable uncertainty
about the estimation and being sensitive to the parameters of the random trace
generative process.
This paper addresses this problem by developing a systematic approach based
on analytic combinatorics that minimizes bias and uncertainty in model accuracy
assessment by replacing statistical estimation with deterministic accuracy
measures. We experimentally demonstrate the consistency and applicability of
our approach by assessing the accuracy of models inferred by state-of-the-art
inference tools against reference models from established specification mining
benchmarks.
Related papers
- A Probabilistic Perspective on Unlearning and Alignment for Large Language Models [48.96686419141881]
We introduce the first formal probabilistic evaluation framework in Large Language Models (LLMs)
We derive novel metrics with high-probability guarantees concerning the output distribution of a model.
Our metrics are application-independent and allow practitioners to make more reliable estimates about model capabilities before deployment.
arXiv Detail & Related papers (2024-10-04T15:44:23Z) - Uncertainty Quantification of Surrogate Models using Conformal Prediction [7.445864392018774]
We formalise a conformal prediction framework that satisfies predictions in a model-agnostic manner, requiring near-zero computational costs.
The paper looks at providing statistically valid error bars for deterministic models, as well as crafting guarantees to the error bars of probabilistic models.
arXiv Detail & Related papers (2024-08-19T10:46:19Z) - Measuring and Modeling Uncertainty Degree for Monocular Depth Estimation [50.920911532133154]
The intrinsic ill-posedness and ordinal-sensitive nature of monocular depth estimation (MDE) models pose major challenges to the estimation of uncertainty degree.
We propose to model the uncertainty of MDE models from the perspective of the inherent probability distributions.
By simply introducing additional training regularization terms, our model, with surprisingly simple formations and without requiring extra modules or multiple inferences, can provide uncertainty estimations with state-of-the-art reliability.
arXiv Detail & Related papers (2023-07-19T12:11:15Z) - The Implicit Delta Method [61.36121543728134]
In this paper, we propose an alternative, the implicit delta method, which works by infinitesimally regularizing the training loss of uncertainty.
We show that the change in the evaluation due to regularization is consistent for the variance of the evaluation estimator, even when the infinitesimal change is approximated by a finite difference.
arXiv Detail & Related papers (2022-11-11T19:34:17Z) - Calibration tests beyond classification [30.616624345970973]
Most supervised machine learning tasks are subject to irreducible prediction errors.
Probabilistic predictive models address this limitation by providing probability distributions that represent a belief over plausible targets.
Calibrated models guarantee that the predictions are neither over- nor under-confident.
arXiv Detail & Related papers (2022-10-21T09:49:57Z) - Variational Gibbs Inference for Statistical Model Estimation from
Incomplete Data [7.4250022679087495]
We introduce variational Gibbs inference (VGI), a new general-purpose method to estimate the parameters of statistical models from incomplete data.
We validate VGI on a set of synthetic and real-world estimation tasks, estimating important machine learning models such as variational autoencoders and normalising flows from incomplete data.
arXiv Detail & Related papers (2021-11-25T17:22:22Z) - Calibrating Over-Parametrized Simulation Models: A Framework via
Eligibility Set [3.862247454265944]
We develop a framework to develop calibration schemes that satisfy rigorous frequentist statistical guarantees.
We demonstrate our methodology on several numerical examples, including an application to calibration of a limit order book market simulator.
arXiv Detail & Related papers (2021-05-27T00:59:29Z) - Anomaly Detection of Time Series with Smoothness-Inducing Sequential
Variational Auto-Encoder [59.69303945834122]
We present a Smoothness-Inducing Sequential Variational Auto-Encoder (SISVAE) model for robust estimation and anomaly detection of time series.
Our model parameterizes mean and variance for each time-stamp with flexible neural networks.
We show the effectiveness of our model on both synthetic datasets and public real-world benchmarks.
arXiv Detail & Related papers (2021-02-02T06:15:15Z) - Performance metrics for intervention-triggering prediction models do not
reflect an expected reduction in outcomes from using the model [71.9860741092209]
Clinical researchers often select among and evaluate risk prediction models.
Standard metrics calculated from retrospective data are only related to model utility under certain assumptions.
When predictions are delivered repeatedly throughout time, the relationship between standard metrics and utility is further complicated.
arXiv Detail & Related papers (2020-06-02T16:26:49Z) - Efficient Ensemble Model Generation for Uncertainty Estimation with
Bayesian Approximation in Segmentation [74.06904875527556]
We propose a generic and efficient segmentation framework to construct ensemble segmentation models.
In the proposed method, ensemble models can be efficiently generated by using the layer selection method.
We also devise a new pixel-wise uncertainty loss, which improves the predictive performance.
arXiv Detail & Related papers (2020-05-21T16:08:38Z) - Amortized Bayesian model comparison with evidential deep learning [0.12314765641075436]
We propose a novel method for performing Bayesian model comparison using specialized deep learning architectures.
Our method is purely simulation-based and circumvents the step of explicitly fitting all alternative models under consideration to each observed dataset.
We show that our method achieves excellent results in terms of accuracy, calibration, and efficiency across the examples considered in this work.
arXiv Detail & Related papers (2020-04-22T15:15:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.