Bagged Regularized $k$-Distances for Anomaly Detection
- URL: http://arxiv.org/abs/2312.01046v2
- Date: Tue, 13 Feb 2024 13:26:00 GMT
- Title: Bagged Regularized $k$-Distances for Anomaly Detection
- Authors: Yuchao Cai and Yuheng Ma and Hanfang Yang and Hanyuan Hang
- Abstract summary: We propose a new distance-based algorithm called bagged regularized $k$-distances for anomaly detection (BRDAD)
Our BRDAD algorithm selects the weights by minimizing the surrogate risk, i.e., the finite sample bound of the empirical risk of the bagged weighted $k$-distances for density estimation (BWDDE)
On the theoretical side, we establish fast convergence rates of the AUC regret of our algorithm and demonstrate that the bagging technique significantly reduces the computational complexity.
- Score: 9.899763598214122
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the paradigm of unsupervised anomaly detection, which involves
the identification of anomalies within a dataset in the absence of labeled
examples. Though distance-based methods are top-performing for unsupervised
anomaly detection, they suffer heavily from the sensitivity to the choice of
the number of the nearest neighbors. In this paper, we propose a new
distance-based algorithm called bagged regularized $k$-distances for anomaly
detection (BRDAD) converting the unsupervised anomaly detection problem into a
convex optimization problem. Our BRDAD algorithm selects the weights by
minimizing the surrogate risk, i.e., the finite sample bound of the empirical
risk of the bagged weighted $k$-distances for density estimation (BWDDE). This
approach enables us to successfully address the sensitivity challenge of the
hyperparameter choice in distance-based algorithms. Moreover, when dealing with
large-scale datasets, the efficiency issues can be addressed by the
incorporated bagging technique in our BRDAD algorithm. On the theoretical side,
we establish fast convergence rates of the AUC regret of our algorithm and
demonstrate that the bagging technique significantly reduces the computational
complexity. On the practical side, we conduct numerical experiments on anomaly
detection benchmarks to illustrate the insensitivity of parameter selection of
our algorithm compared with other state-of-the-art distance-based methods.
Moreover, promising improvements are brought by applying the bagging technique
in our algorithm on real-world datasets.
Related papers
- Best Arm Identification with Fixed Budget: A Large Deviation Perspective [54.305323903582845]
We present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
In particular, we present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
arXiv Detail & Related papers (2023-12-19T13:17:43Z) - Optimal Algorithms for the Inhomogeneous Spiked Wigner Model [89.1371983413931]
We derive an approximate message-passing algorithm (AMP) for the inhomogeneous problem.
We identify in particular the existence of a statistical-to-computational gap where known algorithms require a signal-to-noise ratio bigger than the information-theoretic threshold to perform better than random.
arXiv Detail & Related papers (2023-02-13T19:57:17Z) - Machine Learning for Online Algorithm Selection under Censored Feedback [71.6879432974126]
In online algorithm selection (OAS), instances of an algorithmic problem class are presented to an agent one after another, and the agent has to quickly select a presumably best algorithm from a fixed set of candidate algorithms.
For decision problems such as satisfiability (SAT), quality typically refers to the algorithm's runtime.
In this work, we revisit multi-armed bandit algorithms for OAS and discuss their capability of dealing with the problem.
We adapt them towards runtime-oriented losses, allowing for partially censored data while keeping a space- and time-complexity independent of the time horizon.
arXiv Detail & Related papers (2021-09-13T18:10:52Z) - Deep Unfolding of Iteratively Reweighted ADMM for Wireless RF Sensing [22.467957268653077]
We address the detection of material defects inside a layered material structure using compressive sensing based multiple-output (MIMO) wireless radar.
In many scenarios, the number of defects challenging the layered structure can be modeled as a low-rank structure.
arXiv Detail & Related papers (2021-06-07T15:00:33Z) - Variational Pedestrian Detection [33.52588723666144]
We develop a unique perspective of pedestrian detection as a variational inference problem.
We formulate a novel and efficient algorithm for pedestrian detection by modeling the dense proposals as a latent variable.
Our method can also be flexibly applied to two-stage detectors, achieving notable performance enhancement.
arXiv Detail & Related papers (2021-04-26T08:06:41Z) - Unsupervised Anomaly Detectors to Detect Intrusions in the Current
Threat Landscape [0.11470070927586014]
We show that Isolation Forests, One-Class Support Vector Machines and Self-Organizing Maps are more effective than their counterparts for intrusion detection.
We detail how attacks with unstable, distributed or non-repeatable behavior as Fuzzing, Worms and Botnets are more difficult to detect.
arXiv Detail & Related papers (2020-12-21T14:06:58Z) - Stochastic Hard Thresholding Algorithms for AUC Maximization [49.00683387735522]
We develop a hard thresholding algorithm for AUC in distributiond classification.
We conduct experiments to show the efficiency and effectiveness of the proposed algorithms.
arXiv Detail & Related papers (2020-11-04T16:49:29Z) - Adaptive Sampling for Best Policy Identification in Markov Decision
Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model.
The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z) - Accelerated Message Passing for Entropy-Regularized MAP Inference [89.15658822319928]
Maximum a posteriori (MAP) inference in discrete-valued random fields is a fundamental problem in machine learning.
Due to the difficulty of this problem, linear programming (LP) relaxations are commonly used to derive specialized message passing algorithms.
We present randomized methods for accelerating these algorithms by leveraging techniques that underlie classical accelerated gradient.
arXiv Detail & Related papers (2020-07-01T18:43:32Z) - Anomaly Detection Under Controlled Sensing Using Actor-Critic
Reinforcement Learning [31.841289319809807]
We consider the problem of detecting anomalies among a given set of processes using noisy binary sensor measurements.
The noiseless sensor measurement corresponding to a normal process is 0, and the measurement is 1 if the process is anomalous.
We propose a sequential sensor selection policy that dynamically determines which processes to observe at each time and when to terminate the detection algorithm.
arXiv Detail & Related papers (2020-05-26T22:53:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.