Controllable RANSAC-based Anomaly Detection via Hypothesis Testing
- URL: http://arxiv.org/abs/2410.15133v1
- Date: Sat, 19 Oct 2024 15:15:41 GMT
- Title: Controllable RANSAC-based Anomaly Detection via Hypothesis Testing
- Authors: Le Hong Phong, Ho Ngoc Luat, Vo Nguyen Le Duy,
- Abstract summary: We propose a novel statistical method for testing the anomaly detection results obtained by RANSAC (controllable RANSAC)
The key strength of the proposed method lies in its ability to control the probability of misidentifying anomalies below a pre-specified level.
Experiments conducted on synthetic and real-world datasets robustly support our theoretical results.
- Score: 7.10052009802944
- License:
- Abstract: Detecting the presence of anomalies in regression models is a crucial task in machine learning, as anomalies can significantly impact the accuracy and reliability of predictions. Random Sample Consensus (RANSAC) is one of the most popular robust regression methods for addressing this challenge. However, this method lacks the capability to guarantee the reliability of the anomaly detection (AD) results. In this paper, we propose a novel statistical method for testing the AD results obtained by RANSAC, named CTRL-RANSAC (controllable RANSAC). The key strength of the proposed method lies in its ability to control the probability of misidentifying anomalies below a pre-specified level $\alpha$ (e.g., $\alpha = 0.05$). By examining the selection strategy of RANSAC and leveraging the Selective Inference (SI) framework, we prove that achieving controllable RANSAC is indeed feasible. Furthermore, we introduce a more strategic and computationally efficient approach to enhance the true detection rate and overall performance of the CTRL-RANSAC. Experiments conducted on synthetic and real-world datasets robustly support our theoretical results, showcasing the superior performance of the proposed method.
Related papers
- Boosting Certificate Robustness for Time Series Classification with Efficient Self-Ensemble [10.63844868166531]
Randomized Smoothing has emerged as a standout method due to its ability to certify a provable lower bound on robustness radius under $ell_p$-ball attacks.
We propose a self-ensemble method to enhance the lower bound of the probability confidence of predicted labels by reducing the variance of classification margins.
This approach also addresses the computational overhead issue of Deep Ensemble(DE) while remaining competitive and, in some cases, outperforming it in terms of robustness.
arXiv Detail & Related papers (2024-09-04T15:22:08Z) - Automatically Adaptive Conformal Risk Control [49.95190019041905]
We propose a methodology for achieving approximate conditional control of statistical risks by adapting to the difficulty of test samples.
Our framework goes beyond traditional conditional risk control based on user-provided conditioning events to the algorithmic, data-driven determination of appropriate function classes for conditioning.
arXiv Detail & Related papers (2024-06-25T08:29:32Z) - A Weighted Prognostic Covariate Adjustment Method for Efficient and
Powerful Treatment Effect Inferences in Randomized Controlled Trials [0.28087862620958753]
A crucial task for a randomized controlled trial (RCT) is to specify a statistical method that can yield an efficient estimator and powerful test for the treatment effect.
Training a generative AI algorithm on historical control data enables one to construct a digital twin generator (DTG) for RCT participants.
DTG generates a probability distribution for RCT participants' potential control outcome.
arXiv Detail & Related papers (2023-09-25T16:14:13Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - Assessment of Treatment Effect Estimators for Heavy-Tailed Data [70.72363097550483]
A central obstacle in the objective assessment of treatment effect (TE) estimators in randomized control trials (RCTs) is the lack of ground truth (or validation set) to test their performance.
We provide a novel cross-validation-like methodology to address this challenge.
We evaluate our methodology across 709 RCTs implemented in the Amazon supply chain.
arXiv Detail & Related papers (2021-12-14T17:53:01Z) - CC-Cert: A Probabilistic Approach to Certify General Robustness of
Neural Networks [58.29502185344086]
In safety-critical machine learning applications, it is crucial to defend models against adversarial attacks.
It is important to provide provable guarantees for deep learning models against semantically meaningful input transformations.
We propose a new universal probabilistic certification approach based on Chernoff-Cramer bounds.
arXiv Detail & Related papers (2021-09-22T12:46:04Z) - Risk Minimization from Adaptively Collected Data: Guarantees for
Supervised and Policy Learning [57.88785630755165]
Empirical risk minimization (ERM) is the workhorse of machine learning, but its model-agnostic guarantees can fail when we use adaptively collected data.
We study a generic importance sampling weighted ERM algorithm for using adaptively collected data to minimize the average of a loss function over a hypothesis class.
For policy learning, we provide rate-optimal regret guarantees that close an open gap in the existing literature whenever exploration decays to zero.
arXiv Detail & Related papers (2021-06-03T09:50:13Z) - A Non-asymptotic Approach to Best-Arm Identification for Gaussian
Bandits [0.0]
We propose a new strategy for best-arm identification with fixed confidence of Gaussian variables with bounded means and unit variance.
This strategy called Exploration-Biased Sampling is not onlyally optimal: we also prove non-asymptotic bounds occurring with high probability.
arXiv Detail & Related papers (2021-05-27T07:42:49Z) - Probabilistic robust linear quadratic regulators with Gaussian processes [73.0364959221845]
Probabilistic models such as Gaussian processes (GPs) are powerful tools to learn unknown dynamical systems from data for subsequent use in control design.
We present a novel controller synthesis for linearized GP dynamics that yields robust controllers with respect to a probabilistic stability margin.
arXiv Detail & Related papers (2021-05-17T08:36:18Z) - Efficient falsification approach for autonomous vehicle validation using
a parameter optimisation technique based on reinforcement learning [6.198523595657983]
The widescale deployment of Autonomous Vehicles (AV) appears to be imminent despite many safety challenges that are yet to be resolved.
The uncertainties in the behaviour of the traffic participants and the dynamic world cause reactions in advanced autonomous systems.
This paper presents an efficient falsification method to evaluate the System Under Test.
arXiv Detail & Related papers (2020-11-16T02:56:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.