Shapley-Based Data Valuation with Mutual Information: A Key to Modified K-Nearest Neighbors
- URL: http://arxiv.org/abs/2312.01991v4
- Date: Thu, 10 Jul 2025 12:18:34 GMT
- Title: Shapley-Based Data Valuation with Mutual Information: A Key to Modified K-Nearest Neighbors
- Authors: Mohammad Ali Vahedifar, Azim Akhtarshenas, Mohammad Mohammadi Rafatpanah, Maryam Sabbaghian,
- Abstract summary: Information-Modified KNN (IM-KNN) is a novel approach that leverages Mutual Information ($I$) and Shapley values to assign weighted values to neighbors.<n>On average, IM-KNN improves the accuracy, precision, and recall of traditional KNN by 16.80%, 17.08%, and 16.98%, respectively, across 12 benchmark datasets.
- Score: 4.1498463236541605
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The K-Nearest Neighbors (KNN) algorithm is widely used for classification and regression; however, it suffers from limitations, including the equal treatment of all samples. We propose Information-Modified KNN (IM-KNN), a novel approach that leverages Mutual Information ($I$) and Shapley values to assign weighted values to neighbors, thereby bridging the gap in treating all samples with the same value and weight. On average, IM-KNN improves the accuracy, precision, and recall of traditional KNN by 16.80%, 17.08%, and 16.98%, respectively, across 12 benchmark datasets. Experiments on four large-scale datasets further highlight IM-KNN's robustness to noise, imbalanced data, and skewed distributions.
Related papers
- DW-KNN: A Transparent Local Classifier Integrating Distance Consistency and Neighbor Reliability [0.7874708385247353]
DW-KNN is a transparent and robust variant that integrates exponential distance with neighbor validity.<n>It achieves 0.8988 accuracy on average, ranks 2nd among six methods and within 0.2% of the best-performing Ensemble KNN.
arXiv Detail & Related papers (2025-11-28T09:26:45Z) - A Novel Pseudo Nearest Neighbor Classification Method Using Local Harmonic Mean Distance [0.0]
This article introduces a novel KNN-based classification method called LMPHNN.
LMPHNN improves classification performance based on LMPNN rules and HMD.
It achieves an average precision of 97%, surpassing other methods by 14%.
arXiv Detail & Related papers (2024-05-10T04:13:07Z) - INK: Injecting kNN Knowledge in Nearest Neighbor Machine Translation [57.952478914459164]
kNN-MT has provided an effective paradigm to smooth the prediction based on neighbor representations during inference.
We propose an effective training framework INK to directly smooth the representation space via adjusting representations of kNN neighbors with a small number of new parameters.
Experiments on four benchmark datasets show that method achieves average gains of 1.99 COMET and 1.0 BLEU, outperforming the state-of-the-art kNN-MT system with 0.02x memory space and 1.9x inference speedup.
arXiv Detail & Related papers (2023-06-10T08:39:16Z) - Towards Robust k-Nearest-Neighbor Machine Translation [72.9252395037097]
k-Nearest-Neighbor Machine Translation (kNN-MT) becomes an important research direction of NMT in recent years.
Its main idea is to retrieve useful key-value pairs from an additional datastore to modify translations without updating the NMT model.
The underlying retrieved noisy pairs will dramatically deteriorate the model performance.
We propose a confidence-enhanced kNN-MT model with robust training to alleviate the impact of noise.
arXiv Detail & Related papers (2022-10-17T07:43:39Z) - Nearest Neighbor Zero-Shot Inference [68.56747574377215]
kNN-Prompt is a technique to use k-nearest neighbor (kNN) retrieval augmentation for zero-shot inference with language models (LMs)
fuzzy verbalizers leverage the sparse kNN distribution for downstream tasks by automatically associating each classification label with a set of natural language tokens.
Experiments show that kNN-Prompt is effective for domain adaptation with no further training, and that the benefits of retrieval increase with the size of the model used for kNN retrieval.
arXiv Detail & Related papers (2022-05-27T07:00:59Z) - Comparative Analysis of Interval Reachability for Robust Implicit and
Feedforward Neural Networks [64.23331120621118]
We use interval reachability analysis to obtain robustness guarantees for implicit neural networks (INNs)
INNs are a class of implicit learning models that use implicit equations as layers.
We show that our approach performs at least as well as, and generally better than, applying state-of-the-art interval bound propagation methods to INNs.
arXiv Detail & Related papers (2022-04-01T03:31:27Z) - Rethinking Nearest Neighbors for Visual Classification [56.00783095670361]
k-NN is a lazy learning method that aggregates the distance between the test image and top-k neighbors in a training set.
We adopt k-NN with pre-trained visual representations produced by either supervised or self-supervised methods in two steps.
Via extensive experiments on a wide range of classification tasks, our study reveals the generality and flexibility of k-NN integration.
arXiv Detail & Related papers (2021-12-15T20:15:01Z) - KNN-BERT: Fine-Tuning Pre-Trained Models with KNN Classifier [61.063988689601416]
Pre-trained models are widely used in fine-tuning downstream tasks with linear classifiers optimized by the cross-entropy loss.
These problems can be improved by learning representations that focus on similarities in the same class and contradictions when making predictions.
We introduce the KNearest Neighbors in pre-trained model fine-tuning tasks in this paper.
arXiv Detail & Related papers (2021-10-06T06:17:05Z) - Shift-Robust GNNs: Overcoming the Limitations of Localized Graph
Training data [52.771780951404565]
Shift-Robust GNN (SR-GNN) is designed to account for distributional differences between biased training data and the graph's true inference distribution.
We show that SR-GNN outperforms other GNN baselines by accuracy, eliminating at least (40%) of the negative effects introduced by biased training data.
arXiv Detail & Related papers (2021-08-02T18:00:38Z) - Adaptive Nearest Neighbor Machine Translation [60.97183408140499]
kNN-MT combines pre-trained neural machine translation with token-level k-nearest-neighbor retrieval.
Traditional kNN algorithm simply retrieves a same number of nearest neighbors for each target token.
We propose Adaptive kNN-MT to dynamically determine the number of k for each target token.
arXiv Detail & Related papers (2021-05-27T09:27:42Z) - Evaluating Deep Neural Network Ensembles by Majority Voting cum
Meta-Learning scheme [3.351714665243138]
We propose an ensemble of seven independent Deep Neural Networks (DNNs) for a new data instance.
One-seventh of the data is deleted and replenished by bootstrap sampling from the remaining samples.
All the algorithms in this paper have been tested on five benchmark datasets.
arXiv Detail & Related papers (2021-05-09T03:10:56Z) - KNN Classification with One-step Computation [10.381276986079865]
A one-step computation is proposed to replace the lazy part of KNN classification.
The proposed approach is experimentally evaluated, and demonstrated that the one-step KNN classification is efficient and promising.
arXiv Detail & Related papers (2020-12-09T13:34:42Z) - KNN-enhanced Deep Learning Against Noisy Labels [4.765948508271371]
Supervised learning on Deep Neural Networks (DNNs) is data hungry.
In this work, we propose to apply deep KNN for label cleanup.
We iteratively train the neural network and update labels to simultaneously proceed towards higher label recovery rate and better classification performance.
arXiv Detail & Related papers (2020-12-08T05:21:29Z) - A new hashing based nearest neighbors selection technique for big
datasets [14.962398031252063]
This paper proposes a new technique that enables the selection of nearest neighbors directly in the neighborhood of a given observation.
The proposed approach consists of dividing the data space into subcells of a virtual grid built on top of data space.
Our algorithm outperforms the original KNN in time efficiency with a prediction quality as good as that of KNN.
arXiv Detail & Related papers (2020-04-05T19:36:00Z) - Partial Weight Adaptation for Robust DNN Inference [9.301756947410773]
We present GearNN, an adaptive inference architecture that accommodates heterogeneous inputs.
GearNN improves the accuracy (mIoU) by an average of 18.12% over a DNN trained with the undistorted dataset and 4.84% over stability training from Google.
arXiv Detail & Related papers (2020-03-13T06:25:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.