Related papers: A flexible outlier detector based on a topology given by graph communities

A flexible outlier detector based on a topology given by graph communities

URL: http://arxiv.org/abs/2002.07791v1
Date: Tue, 18 Feb 2020 18:40:31 GMT
Title: A flexible outlier detector based on a topology given by graph communities
Authors: O. Ramos Terrades, A. Berenguel, D. Gil
Abstract summary: anomaly detection is essential for optimal performance of machine learning methods and statistical predictive models. Topology is computed using the communities of a weighted graph codifying mutual nearest neighbors in the feature space. Our approach overall outperforms, both, local and global strategies in multi and single view settings.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Outlier, or anomaly, detection is essential for optimal performance of machine learning methods and statistical predictive models. It is not just a technical step in a data cleaning process but a key topic in many fields such as fraudulent document detection, in medical applications and assisted diagnosis systems or detecting security threats. In contrast to population-based methods, neighborhood based local approaches are simple flexible methods that have the potential to perform well in small sample size unbalanced problems. However, a main concern of local approaches is the impact that the computation of each sample neighborhood has on the method performance. Most approaches use a distance in the feature space to define a single neighborhood that requires careful selection of several parameters. This work presents a local approach based on a local measure of the heterogeneity of sample labels in the feature space considered as a topological manifold. Topology is computed using the communities of a weighted graph codifying mutual nearest neighbors in the feature space. This way, we provide with a set of multiple neighborhoods able to describe the structure of complex spaces without parameter fine tuning. The extensive experiments on real-world data sets show that our approach overall outperforms, both, local and global strategies in multi and single view settings.

Related papers

PeFAD: A Parameter-Efficient Federated Framework for Time Series Anomaly Detection [51.20479454379662]
We propose a. Federated Anomaly Detection framework named PeFAD with the increasing privacy concerns. We conduct extensive evaluations on four real datasets, where PeFAD outperforms existing state-of-the-art baselines by up to 28.74%.
arXiv Detail & Related papers (2024-06-04T13:51:08Z)
Large-scale Fully-Unsupervised Re-Identification [78.47108158030213]
We propose two strategies to learn from large-scale unlabeled data. The first strategy performs a local neighborhood sampling to reduce the dataset size in each without violating neighborhood relationships. A second strategy leverages a novel Re-Ranking technique, which has a lower time upper bound complexity and reduces the memory complexity from O(n2) to O(kn) with k n.
arXiv Detail & Related papers (2023-07-26T16:19:19Z)
A One-shot Framework for Distributed Clustered Learning in Heterogeneous Environments [54.172993875654015]
The paper proposes a family of communication efficient methods for distributed learning in heterogeneous environments. One-shot approach, based on local computations at the users and a clustering based aggregation step at the server is shown to provide strong learning guarantees. For strongly convex problems it is shown that, as long as the number of data points per user is above a threshold, the proposed approach achieves order-optimal mean-squared error rates in terms of the sample size.
arXiv Detail & Related papers (2022-09-22T09:04:10Z)
Learning-based Localizability Estimation for Robust LiDAR Localization [13.298113481670038]
LiDAR-based localization and mapping is one of the core components in many modern robotic systems. This work proposes a neural network-based estimation approach for detecting (non-)localizability during robot operation.
arXiv Detail & Related papers (2022-03-11T01:12:00Z)
Local Learning Matters: Rethinking Data Heterogeneity in Federated Learning [61.488646649045215]
Federated learning (FL) is a promising strategy for performing privacy-preserving, distributed learning with a network of clients (i.e., edge devices)
arXiv Detail & Related papers (2021-11-28T19:03:39Z)
An Information-Theoretic Approach to Persistent Environment Monitoring Through Low Rank Model Based Planning and Prediction [19.95989053853125]
We introduce a method for selecting a limited number of observation points in a large region. We combine a low rank model of a target attribute with an information-maximizing path planner to predict the state of the attribute throughout a region. We evaluate our method in simulation on two real-world environment datasets.
arXiv Detail & Related papers (2020-09-02T16:19:55Z)
Locally induced Gaussian processes for large-scale simulation experiments [0.0]
We show how placement of inducing points and their multitude can be thwarted by pathologies. Our proposed methodology hybridizes global inducing point and data subset-based local GP approximation. We show that local inducing points extend their global and data-subset component parts on the accuracy--computational efficiency frontier.
arXiv Detail & Related papers (2020-08-28T21:37:46Z)
Spatial Classification With Limited Observations Based On Physics-Aware Structural Constraint [18.070762916388272]
spatial classification with limited feature observations has been a challenging problem in machine learning. This paper extends our recent approach by allowing feature values of samples in each class to follow a multi-modal distribution. We propose learning algorithms for the extended model with multi-modal distribution.
arXiv Detail & Related papers (2020-08-25T20:07:28Z)
Making Affine Correspondences Work in Camera Geometry Computation [62.7633180470428]
Local features provide region-to-region rather than point-to-point correspondences. We propose guidelines for effective use of region-to-region matches in the course of a full model estimation pipeline. Experiments show that affine solvers can achieve accuracy comparable to point-based solvers at faster run-times.
arXiv Detail & Related papers (2020-07-20T12:07:48Z)
Guided Uncertainty-Aware Policy Optimization: Combining Learning and Model-Based Strategies for Sample-Efficient Policy Learning [75.56839075060819]
Traditional robotic approaches rely on an accurate model of the environment, a detailed description of how to perform the task, and a robust perception system to keep track of the current state. reinforcement learning approaches can operate directly from raw sensory inputs with only a reward signal to describe the task, but are extremely sample-inefficient and brittle. In this work, we combine the strengths of model-based methods with the flexibility of learning-based methods to obtain a general method that is able to overcome inaccuracies in the robotics perception/actuation pipeline.
arXiv Detail & Related papers (2020-05-21T19:47:05Z)
Mean shift cluster recognition method implementation in the nested sampling algorithm [0.0]
Nested sampling is an efficient algorithm for the calculation of the Bayesian evidence and posterior parameter probability distributions. Here we present a new solution based on the mean shift cluster recognition method implemented in a random walk search algorithm.
arXiv Detail & Related papers (2020-01-31T15:04:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.