Related papers: Computational Efficient Approximations of the Concordance Probability in a Big Data Setting

Computational Efficient Approximations of the Concordance Probability in a Big Data Setting

URL: http://arxiv.org/abs/2105.10392v1
Date: Fri, 21 May 2021 15:09:53 GMT
Title: Computational Efficient Approximations of the Concordance Probability in a Big Data Setting
Authors: Robin Van Oirbeek and Jolien Ponnet and Tim Verdonck
Abstract summary: We propose two estimation methods that calculate the concordance probability in a fast and accurate way. Experiments on two real-life data sets confirm the conclusions of the artificial simulations.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Performance measurement is an essential task once a statistical model is created. The Area Under the receiving operating characteristics Curve (AUC) is the most popular measure for evaluating the quality of a binary classifier. In this case, AUC is equal to the concordance probability, a frequently used measure to evaluate the discriminatory power of the model. Contrary to AUC, the concordance probability can also be extended to the situation with a continuous response variable. Due to the staggering size of data sets nowadays, determining this discriminatory measure requires a tremendous amount of costly computations and is hence immensely time consuming, certainly in case of a continuous response variable. Therefore, we propose two estimation methods that calculate the concordance probability in a fast and accurate way and that can be applied to both the discrete and continuous setting. Extensive simulation studies show the excellent performance and fast computing times of both estimators. Finally, experiments on two real-life data sets confirm the conclusions of the artificial simulations.

Related papers

A Sample Efficient Conditional Independence Test in the Presence of Discretization [54.047334792855345]
Conditional Independence (CI) tests directly to discretized data can lead to incorrect conclusions.<n>Recent advancements have sought to infer the correct CI relationship between the latent variables through binarizing observed data.<n>Motivated by this, this paper introduces a sample-efficient CI test that does not rely on the binarization process.
arXiv Detail & Related papers (2025-06-10T12:41:26Z)
Model-free Methods for Event History Analysis and Efficient Adjustment (PhD Thesis) [55.2480439325792]
This thesis is a series of independent contributions to statistics unified by a model-free perspective. The first chapter elaborates on how a model-free perspective can be used to formulate flexible methods that leverage prediction techniques from machine learning. The second chapter studies the concept of local independence, which describes whether the evolution of one process is directly influenced by another.
arXiv Detail & Related papers (2025-02-11T19:24:09Z)
Testing Conditional Mean Independence Using Generative Neural Networks [8.323172773256449]
We introduce a novel population CMI measure and a bootstrap model-based testing procedure. Deep generative neural networks are used to estimate the conditional mean functions involved in the population measure.
arXiv Detail & Related papers (2025-01-28T23:35:24Z)
Mitigating LLM Hallucinations via Conformal Abstention [70.83870602967625]
We develop a principled procedure for determining when a large language model should abstain from responding in a general domain. We leverage conformal prediction techniques to develop an abstention procedure that benefits from rigorous theoretical guarantees on the hallucination rate (error rate) Experimentally, our resulting conformal abstention method reliably bounds the hallucination rate on various closed-book, open-domain generative question answering datasets.
arXiv Detail & Related papers (2024-04-04T11:32:03Z)
User-defined Event Sampling and Uncertainty Quantification in Diffusion Models for Physical Dynamical Systems [49.75149094527068]
We show that diffusion models can be adapted to make predictions and provide uncertainty quantification for chaotic dynamical systems. We develop a probabilistic approximation scheme for the conditional score function which converges to the true distribution as the noise level decreases. We are able to sample conditionally on nonlinear userdefined events at inference time, and matches data statistics even when sampling from the tails of the distribution.
arXiv Detail & Related papers (2023-06-13T03:42:03Z)
Semi-Parametric Inference for Doubly Stochastic Spatial Point Processes: An Approximate Penalized Poisson Likelihood Approach [3.085995273374333]
Doubly-stochastic point processes model the occurrence of events over a spatial domain as an inhomogeneous process conditioned on the realization of a random intensity function. Existing implementations of doubly-stochastic spatial models are computationally demanding, often have limited theoretical guarantee, and/or rely on restrictive assumptions.
arXiv Detail & Related papers (2023-06-11T19:48:39Z)
Testing for Overfitting [0.0]
We discuss the overfitting problem and explain why standard and concentration results do not hold for evaluation with training data. We introduce and argue for a hypothesis test by means of which both model performance may be evaluated using training data.
arXiv Detail & Related papers (2023-05-09T22:49:55Z)
Spectral Representation Learning for Conditional Moment Models [33.34244475589745]
We propose a procedure that automatically learns representations with controlled measures of ill-posedness. Our method approximates a linear representation defined by the spectral decomposition of a conditional expectation operator. We show this representation can be efficiently estimated from data, and establish L2 consistency for the resulting estimator.
arXiv Detail & Related papers (2022-10-29T07:48:29Z)
Statistical Efficiency of Score Matching: The View from Isoperimetry [96.65637602827942]
We show a tight connection between statistical efficiency of score matching and the isoperimetric properties of the distribution being estimated. We formalize these results both in the sample regime and in the finite regime.
arXiv Detail & Related papers (2022-10-03T06:09:01Z)
Scalable Uncertainty Quantification for Deep Operator Networks using Randomized Priors [14.169588600819546]
We present a simple and effective approach for posterior uncertainty quantification in deep operator networks (DeepONets) We adopt a frequentist approach based on randomized prior ensembles, and put forth an efficient vectorized implementation for fast parallel inference on accelerated hardware.
arXiv Detail & Related papers (2022-03-06T20:48:16Z)
Approximate Bayesian Computation with Path Signatures [0.5156484100374059]
We introduce the use of path signatures as a natural candidate feature set for constructing distances between time series data. Our experiments show that such an approach can generate more accurate approximate Bayesian posteriors than existing techniques for time series models.
arXiv Detail & Related papers (2021-06-23T17:25:43Z)
Multivariate Probabilistic Regression with Natural Gradient Boosting [63.58097881421937]
We propose a Natural Gradient Boosting (NGBoost) approach based on nonparametrically modeling the conditional parameters of the multivariate predictive distribution. Our method is robust, works out-of-the-box without extensive tuning, is modular with respect to the assumed target distribution, and performs competitively in comparison to existing approaches.
arXiv Detail & Related papers (2021-06-07T17:44:49Z)
$\gamma$-ABC: Outlier-Robust Approximate Bayesian Computation Based on a Robust Divergence Estimator [95.71091446753414]
We propose to use a nearest-neighbor-based $gamma$-divergence estimator as a data discrepancy measure. Our method achieves significantly higher robustness than existing discrepancy measures.
arXiv Detail & Related papers (2020-06-13T06:09:27Z)
Machine learning for causal inference: on the use of cross-fit estimators [77.34726150561087]
Doubly-robust cross-fit estimators have been proposed to yield better statistical properties. We conducted a simulation study to assess the performance of several estimators for the average causal effect (ACE) When used with machine learning, the doubly-robust cross-fit estimators substantially outperformed all of the other estimators in terms of bias, variance, and confidence interval coverage.
arXiv Detail & Related papers (2020-04-21T23:09:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.