A Weighted Mutual k-Nearest Neighbour for Classification Mining
- URL: http://arxiv.org/abs/2005.08640v1
- Date: Thu, 14 May 2020 18:11:30 GMT
- Title: A Weighted Mutual k-Nearest Neighbour for Classification Mining
- Authors: Joydip Dhar, Ashaya Shukla, Mukul Kumar, Prashant Gupta
- Abstract summary: kNN is a very effective Instance based learning method, and it is easy to implement.
In this paper, we propose a new learning algorithm which performs the task of anomaly detection and removal of pseudo neighbours from the dataset.
- Score: 4.538870924201896
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: kNN is a very effective Instance based learning method, and it is easy to
implement. Due to heterogeneous nature of data, noises from different possible
sources are also widespread in nature especially in case of large-scale
databases. For noise elimination and effect of pseudo neighbours, in this
paper, we propose a new learning algorithm which performs the task of anomaly
detection and removal of pseudo neighbours from the dataset so as to provide
comparative better results. This algorithm also tries to minimize effect of
those neighbours which are distant. A concept of certainty measure is also
introduced for experimental results. The advantage of using concept of mutual
neighbours and distance-weighted voting is that, dataset will be refined after
removal of anomaly and weightage concept compels to take into account more
consideration of those neighbours, which are closer. Consequently, finally the
performance of proposed algorithm is calculated.
Related papers
- Disentangled Noisy Correspondence Learning [56.06801962154915]
Cross-modal retrieval is crucial in understanding latent correspondences across modalities.
DisNCL is a novel information-theoretic framework for feature Disentanglement in Noisy Correspondence Learning.
arXiv Detail & Related papers (2024-08-10T09:49:55Z) - Optimizing the Noise in Self-Supervised Learning: from Importance
Sampling to Noise-Contrastive Estimation [80.07065346699005]
It is widely assumed that the optimal noise distribution should be made equal to the data distribution, as in Generative Adversarial Networks (GANs)
We turn to Noise-Contrastive Estimation which grounds this self-supervised task as an estimation problem of an energy-based model of the data.
We soberly conclude that the optimal noise may be hard to sample from, and the gain in efficiency can be modest compared to choosing the noise distribution equal to the data's.
arXiv Detail & Related papers (2023-01-23T19:57:58Z) - Evaluation of the impact of the indiscernibility relation on the
fuzzy-rough nearest neighbours algorithm [1.4213973379473654]
Fuzzy-rough nearest neighbours (FRNN) is a classification algorithm based on the classical k-nearest neighbours algorithm.
In this paper, we investigate the impact of the indiscernibility relation on the performance of FRNN classification.
arXiv Detail & Related papers (2022-11-25T14:17:56Z) - Large-Scale Sequential Learning for Recommender and Engineering Systems [91.3755431537592]
In this thesis, we focus on the design of an automatic algorithms that provide personalized ranking by adapting to the current conditions.
For the former, we propose novel algorithm called SAROS that take into account both kinds of feedback for learning over the sequence of interactions.
The proposed idea of taking into account the neighbour lines shows statistically significant results in comparison with the initial approach for faults detection in power grid.
arXiv Detail & Related papers (2022-05-13T21:09:41Z) - The Optimal Noise in Noise-Contrastive Learning Is Not What You Think [80.07065346699005]
We show that deviating from this assumption can actually lead to better statistical estimators.
In particular, the optimal noise distribution is different from the data's and even from a different family.
arXiv Detail & Related papers (2022-03-02T13:59:20Z) - Partial Identification with Noisy Covariates: A Robust Optimization
Approach [94.10051154390237]
Causal inference from observational datasets often relies on measuring and adjusting for covariates.
We show that this robust optimization approach can extend a wide range of causal adjustment methods to perform partial identification.
Across synthetic and real datasets, we find that this approach provides ATE bounds with a higher coverage probability than existing methods.
arXiv Detail & Related papers (2022-02-22T04:24:26Z) - Multi-granularity Relabeled Under-sampling Algorithm for Imbalanced Data [15.030895782548576]
The imbalanced classification problem turns out to be one of the important and challenging problems in data mining and machine learning.
The Tomek-Link sampling algorithm can effectively reduce the class overlap on data, remove the majority instances that are difficult to distinguish, and improve the algorithm classification accuracy.
However, the Tomek-Links under-sampling algorithm only considers the boundary instances that are the nearest neighbors to each other globally and ignores the potential local overlapping instances.
This paper proposes a multi-granularity relabeled under-sampling algorithm (MGRU) which fully considers the local information of the data set in the
arXiv Detail & Related papers (2022-01-11T14:07:55Z) - Leveraging Reinforcement Learning for evaluating Robustness of KNN
Search Algorithms [0.0]
The problem of finding K-nearest neighbors in the given dataset for a given query point has been worked upon since several years.
In this paper, we survey some novel K-Nearest Neighbor Search approaches that tackles the problem of Search from the perspectives of computations.
In order to evaluate the robustness of a KNNS approach against adversarial points, we propose a generic Reinforcement Learning based framework for the same.
arXiv Detail & Related papers (2021-02-10T16:10:58Z) - Non-Local Spatial Propagation Network for Depth Completion [82.60915972250706]
We propose a robust and efficient end-to-end non-local spatial propagation network for depth completion.
The proposed network takes RGB and sparse depth images as inputs and estimates non-local neighbors and their affinities of each pixel.
We show that the proposed algorithm is superior to conventional algorithms in terms of depth completion accuracy and robustness to the mixed-depth problem.
arXiv Detail & Related papers (2020-07-20T12:26:51Z) - Provable Noisy Sparse Subspace Clustering using Greedy Neighbor
Selection: A Coherence-Based Perspective [18.888312436971187]
We derive coherence-based sufficient conditions guaranteeing correct neighbor identification using MP/OMP.
A striking finding is that, when the ground truth subspaces are well-separated from each other and noise is not large, MP-based iterations, while enjoying lower algorithmic complexity, yield smaller perturbation of residuals.
arXiv Detail & Related papers (2020-02-02T14:28:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.