Related papers: Bagged Regularized $k$-Distances for Anomaly Detection

Bagged Regularized $k$-Distances for Anomaly Detection

URL: http://arxiv.org/abs/2312.01046v3
Date: Fri, 01 Aug 2025 16:12:08 GMT
Title: Bagged Regularized $k$-Distances for Anomaly Detection
Authors: Yuchao Cai, Hanfang Yang, Yuheng Ma, Hanyuan Hang,
Abstract summary: We propose a new distance-based algorithm called bagged regularized $k$-distances for anomaly detection (BRDAD)<n>Our BRDAD algorithm selects the weights by minimizing the surrogate risk, i.e., the finite sample bound of the empirical risk of the bagged weighted $k$-distances for density estimation (BWDDE)<n>Our method achieves superior performance on real-world datasets with the introduced bagging technique compared to other approaches.
Score: 9.062164411594175
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider the paradigm of unsupervised anomaly detection, which involves the identification of anomalies within a dataset in the absence of labeled examples. Though distance-based methods are top-performing for unsupervised anomaly detection, they suffer heavily from the sensitivity to the choice of the number of the nearest neighbors. In this paper, we propose a new distance-based algorithm called bagged regularized $k$-distances for anomaly detection (BRDAD), converting the unsupervised anomaly detection problem into a convex optimization problem. Our BRDAD algorithm selects the weights by minimizing the surrogate risk, i.e., the finite sample bound of the empirical risk of the bagged weighted $k$-distances for density estimation (BWDDE). This approach enables us to successfully address the sensitivity challenge of the hyperparameter choice in distance-based algorithms. Moreover, when dealing with large-scale datasets, the efficiency issues can be addressed by the incorporated bagging technique in our BRDAD algorithm. On the theoretical side, we establish fast convergence rates of the AUC regret of our algorithm and demonstrate that the bagging technique significantly reduces the computational complexity. On the practical side, we conduct numerical experiments to illustrate the insensitivity of the parameter selection of our algorithm compared with other state-of-the-art distance-based methods. Furthermore, our method achieves superior performance on real-world datasets with the introduced bagging technique compared to other approaches.

Related papers

Asymptotically Optimal Linear Best Feasible Arm Identification with Fixed Budget [55.938644481736446]
We introduce a novel algorithm for best feasible arm identification that guarantees an exponential decay in the error probability.<n>We validate our algorithm through comprehensive empirical evaluations across various problem instances with different levels of complexity.
arXiv Detail & Related papers (2025-06-03T02:56:26Z)
Tighten The Lasso: A Convex Hull Volume-based Anomaly Detection Method [0.6144680854063939]
We propose a novel anomaly detection algorithm based on the convex hull property of a dataset. Our algorithm computes the CH's volume as an increasing number of data points are removed from the dataset. We show that with a computationally cheap and simple check, one can detect datasets that are well-suited for the proposed algorithm.
arXiv Detail & Related papers (2025-02-25T19:39:20Z)
Quantitative Error Bounds for Scaling Limits of Stochastic Iterative Algorithms [10.022615790746466]
We derive non-asymptotic functional approximation error bounds between the algorithm sample paths and the Ornstein-Uhlenbeck approximation.<n>We use our main result to construct error bounds in terms of two common metrics: the L'evy-Prokhorov and bounded Wasserstein distances.
arXiv Detail & Related papers (2025-01-21T15:29:11Z)
Embedding Trajectory for Out-of-Distribution Detection in Mathematical Reasoning [50.84938730450622]
We propose a trajectory-based method TV score, which uses trajectory volatility for OOD detection in mathematical reasoning. Our method outperforms all traditional algorithms on GLMs under mathematical reasoning scenarios. Our method can be extended to more applications with high-density features in output spaces, such as multiple-choice questions.
arXiv Detail & Related papers (2024-05-22T22:22:25Z)
Best Arm Identification with Fixed Budget: A Large Deviation Perspective [54.305323903582845]
We present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms. In particular, we present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
arXiv Detail & Related papers (2023-12-19T13:17:43Z)
Optimal Algorithms for the Inhomogeneous Spiked Wigner Model [89.1371983413931]
We derive an approximate message-passing algorithm (AMP) for the inhomogeneous problem. We identify in particular the existence of a statistical-to-computational gap where known algorithms require a signal-to-noise ratio bigger than the information-theoretic threshold to perform better than random.
arXiv Detail & Related papers (2023-02-13T19:57:17Z)
A Variance-Reduced Stochastic Gradient Tracking Algorithm for Decentralized Optimization with Orthogonality Constraints [7.028225540638832]
We propose a novel algorithm for decentralized optimization with orthogonality constraints. VRSGT is the first algorithm for decentralized optimization with orthogonality constraints that reduces both sampling and communication complexities simultaneously. In the numerical experiments, VRGTS has a promising performance in a real-world autonomous sample.
arXiv Detail & Related papers (2022-08-29T14:46:44Z)
Accelerated and instance-optimal policy evaluation with linear function approximation [17.995515643150657]
Existing algorithms fail to match at least one of these lower bounds. We develop an accelerated, variance-reduced fast temporal difference algorithm that simultaneously matches both lower bounds and attains a strong notion of instance-optimality.
arXiv Detail & Related papers (2021-12-24T17:21:04Z)
Machine Learning for Online Algorithm Selection under Censored Feedback [71.6879432974126]
In online algorithm selection (OAS), instances of an algorithmic problem class are presented to an agent one after another, and the agent has to quickly select a presumably best algorithm from a fixed set of candidate algorithms. For decision problems such as satisfiability (SAT), quality typically refers to the algorithm's runtime. In this work, we revisit multi-armed bandit algorithms for OAS and discuss their capability of dealing with the problem. We adapt them towards runtime-oriented losses, allowing for partially censored data while keeping a space- and time-complexity independent of the time horizon.
arXiv Detail & Related papers (2021-09-13T18:10:52Z)
Deep Unfolding of Iteratively Reweighted ADMM for Wireless RF Sensing [22.467957268653077]
We address the detection of material defects inside a layered material structure using compressive sensing based multiple-output (MIMO) wireless radar. In many scenarios, the number of defects challenging the layered structure can be modeled as a low-rank structure.
arXiv Detail & Related papers (2021-06-07T15:00:33Z)
Variational Pedestrian Detection [33.52588723666144]
We develop a unique perspective of pedestrian detection as a variational inference problem. We formulate a novel and efficient algorithm for pedestrian detection by modeling the dense proposals as a latent variable. Our method can also be flexibly applied to two-stage detectors, achieving notable performance enhancement.
arXiv Detail & Related papers (2021-04-26T08:06:41Z)
Unsupervised Anomaly Detectors to Detect Intrusions in the Current Threat Landscape [0.11470070927586014]
We show that Isolation Forests, One-Class Support Vector Machines and Self-Organizing Maps are more effective than their counterparts for intrusion detection. We detail how attacks with unstable, distributed or non-repeatable behavior as Fuzzing, Worms and Botnets are more difficult to detect.
arXiv Detail & Related papers (2020-12-21T14:06:58Z)
Stochastic Hard Thresholding Algorithms for AUC Maximization [49.00683387735522]
We develop a hard thresholding algorithm for AUC in distributiond classification. We conduct experiments to show the efficiency and effectiveness of the proposed algorithms.
arXiv Detail & Related papers (2020-11-04T16:49:29Z)
An Asymptotically Optimal Primal-Dual Incremental Algorithm for Contextual Linear Bandits [129.1029690825929]
We introduce a novel algorithm improving over the state-of-the-art along multiple dimensions. We establish minimax optimality for any learning horizon in the special case of non-contextual linear bandits.
arXiv Detail & Related papers (2020-10-23T09:12:47Z)
Adaptive Sampling for Best Policy Identification in Markov Decision Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model. The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z)
Accelerated Message Passing for Entropy-Regularized MAP Inference [89.15658822319928]
Maximum a posteriori (MAP) inference in discrete-valued random fields is a fundamental problem in machine learning. Due to the difficulty of this problem, linear programming (LP) relaxations are commonly used to derive specialized message passing algorithms. We present randomized methods for accelerating these algorithms by leveraging techniques that underlie classical accelerated gradient.
arXiv Detail & Related papers (2020-07-01T18:43:32Z)
Anomaly Detection Under Controlled Sensing Using Actor-Critic Reinforcement Learning [31.841289319809807]
We consider the problem of detecting anomalies among a given set of processes using noisy binary sensor measurements. The noiseless sensor measurement corresponding to a normal process is 0, and the measurement is 1 if the process is anomalous. We propose a sequential sensor selection policy that dynamically determines which processes to observe at each time and when to terminate the detection algorithm.
arXiv Detail & Related papers (2020-05-26T22:53:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.