Related papers: A Weighted Mutual k-Nearest Neighbour for Classification Mining

A Weighted Mutual k-Nearest Neighbour for Classification Mining

URL: http://arxiv.org/abs/2005.08640v1
Date: Thu, 14 May 2020 18:11:30 GMT
Title: A Weighted Mutual k-Nearest Neighbour for Classification Mining
Authors: Joydip Dhar, Ashaya Shukla, Mukul Kumar, Prashant Gupta
Abstract summary: kNN is a very effective Instance based learning method, and it is easy to implement. In this paper, we propose a new learning algorithm which performs the task of anomaly detection and removal of pseudo neighbours from the dataset.
Score: 4.538870924201896
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: kNN is a very effective Instance based learning method, and it is easy to implement. Due to heterogeneous nature of data, noises from different possible sources are also widespread in nature especially in case of large-scale databases. For noise elimination and effect of pseudo neighbours, in this paper, we propose a new learning algorithm which performs the task of anomaly detection and removal of pseudo neighbours from the dataset so as to provide comparative better results. This algorithm also tries to minimize effect of those neighbours which are distant. A concept of certainty measure is also introduced for experimental results. The advantage of using concept of mutual neighbours and distance-weighted voting is that, dataset will be refined after removal of anomaly and weightage concept compels to take into account more consideration of those neighbours, which are closer. Consequently, finally the performance of proposed algorithm is calculated.

Related papers

Euclidean Distance Deflation Under High-Dimensional Heteroskedastic Noise [9.887133861477233]
We develop a principled hyper-free approach that jointly estimates the noise magnitudes and corrects the distances.<n> Notably, when applied to single-cell RNA sequencing data, our method yields noise estimates consistent with an established model.
arXiv Detail & Related papers (2025-07-24T15:45:23Z)
Asymptotically Optimal Linear Best Feasible Arm Identification with Fixed Budget [55.938644481736446]
We introduce a novel algorithm for best feasible arm identification that guarantees an exponential decay in the error probability.<n>We validate our algorithm through comprehensive empirical evaluations across various problem instances with different levels of complexity.
arXiv Detail & Related papers (2025-06-03T02:56:26Z)
Disentangled Noisy Correspondence Learning [56.06801962154915]
Cross-modal retrieval is crucial in understanding latent correspondences across modalities. DisNCL is a novel information-theoretic framework for feature Disentanglement in Noisy Correspondence Learning.
arXiv Detail & Related papers (2024-08-10T09:49:55Z)
Optimizing the Noise in Self-Supervised Learning: from Importance Sampling to Noise-Contrastive Estimation [80.07065346699005]
It is widely assumed that the optimal noise distribution should be made equal to the data distribution, as in Generative Adversarial Networks (GANs) We turn to Noise-Contrastive Estimation which grounds this self-supervised task as an estimation problem of an energy-based model of the data. We soberly conclude that the optimal noise may be hard to sample from, and the gain in efficiency can be modest compared to choosing the noise distribution equal to the data's.
arXiv Detail & Related papers (2023-01-23T19:57:58Z)
Evaluation of the impact of the indiscernibility relation on the fuzzy-rough nearest neighbours algorithm [1.4213973379473654]
Fuzzy-rough nearest neighbours (FRNN) is a classification algorithm based on the classical k-nearest neighbours algorithm. In this paper, we investigate the impact of the indiscernibility relation on the performance of FRNN classification.
arXiv Detail & Related papers (2022-11-25T14:17:56Z)
Large-Scale Sequential Learning for Recommender and Engineering Systems [91.3755431537592]
In this thesis, we focus on the design of an automatic algorithms that provide personalized ranking by adapting to the current conditions. For the former, we propose novel algorithm called SAROS that take into account both kinds of feedback for learning over the sequence of interactions. The proposed idea of taking into account the neighbour lines shows statistically significant results in comparison with the initial approach for faults detection in power grid.
arXiv Detail & Related papers (2022-05-13T21:09:41Z)
The Optimal Noise in Noise-Contrastive Learning Is Not What You Think [80.07065346699005]
We show that deviating from this assumption can actually lead to better statistical estimators. In particular, the optimal noise distribution is different from the data's and even from a different family.
arXiv Detail & Related papers (2022-03-02T13:59:20Z)
Partial Identification with Noisy Covariates: A Robust Optimization Approach [94.10051154390237]
Causal inference from observational datasets often relies on measuring and adjusting for covariates. We show that this robust optimization approach can extend a wide range of causal adjustment methods to perform partial identification. Across synthetic and real datasets, we find that this approach provides ATE bounds with a higher coverage probability than existing methods.
arXiv Detail & Related papers (2022-02-22T04:24:26Z)
Multi-granularity Relabeled Under-sampling Algorithm for Imbalanced Data [15.030895782548576]
The imbalanced classification problem turns out to be one of the important and challenging problems in data mining and machine learning. The Tomek-Link sampling algorithm can effectively reduce the class overlap on data, remove the majority instances that are difficult to distinguish, and improve the algorithm classification accuracy. However, the Tomek-Links under-sampling algorithm only considers the boundary instances that are the nearest neighbors to each other globally and ignores the potential local overlapping instances. This paper proposes a multi-granularity relabeled under-sampling algorithm (MGRU) which fully considers the local information of the data set in the
arXiv Detail & Related papers (2022-01-11T14:07:55Z)
Leveraging Reinforcement Learning for evaluating Robustness of KNN Search Algorithms [0.0]
The problem of finding K-nearest neighbors in the given dataset for a given query point has been worked upon since several years. In this paper, we survey some novel K-Nearest Neighbor Search approaches that tackles the problem of Search from the perspectives of computations. In order to evaluate the robustness of a KNNS approach against adversarial points, we propose a generic Reinforcement Learning based framework for the same.
arXiv Detail & Related papers (2021-02-10T16:10:58Z)
Non-Local Spatial Propagation Network for Depth Completion [82.60915972250706]
We propose a robust and efficient end-to-end non-local spatial propagation network for depth completion. The proposed network takes RGB and sparse depth images as inputs and estimates non-local neighbors and their affinities of each pixel. We show that the proposed algorithm is superior to conventional algorithms in terms of depth completion accuracy and robustness to the mixed-depth problem.
arXiv Detail & Related papers (2020-07-20T12:26:51Z)
Provable Noisy Sparse Subspace Clustering using Greedy Neighbor Selection: A Coherence-Based Perspective [18.888312436971187]
We derive coherence-based sufficient conditions guaranteeing correct neighbor identification using MP/OMP. A striking finding is that, when the ground truth subspaces are well-separated from each other and noise is not large, MP-based iterations, while enjoying lower algorithmic complexity, yield smaller perturbation of residuals.
arXiv Detail & Related papers (2020-02-02T14:28:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.