Related papers: Optimizing Data Shapley Interaction Calculation from O(2^n) to O(t n^2) for KNN models

Optimizing Data Shapley Interaction Calculation from O(2^n) to O(t n^2) for KNN models

URL: http://arxiv.org/abs/2304.01224v1
Date: Sun, 2 Apr 2023 06:15:19 GMT
Title: Optimizing Data Shapley Interaction Calculation from O(2^n) to O(t n^2) for KNN models
Authors: Mohamed Karim Belaid, Dorra El Mekki, Maximilian Rabus, Eyke H\"ullermeier
Abstract summary: "STI-KNN" is an innovative algorithm that calculates the exact pair-interaction Shapley values for KNN models in O(t n2) time. By using STI-KNN, we can efficiently and accurately evaluate the value of individual data points, leading to improved training outcomes and ultimately enhancing the effectiveness of artificial intelligence applications.
Score: 2.365702128814616
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: With the rapid growth of data availability and usage, quantifying the added value of each training data point has become a crucial process in the field of artificial intelligence. The Shapley values have been recognized as an effective method for data valuation, enabling efficient training set summarization, acquisition, and outlier removal. In this paper, we introduce "STI-KNN", an innovative algorithm that calculates the exact pair-interaction Shapley values for KNN models in O(t n^2) time, which is a significant improvement over the O(2^n)$ time complexity of baseline methods. By using STI-KNN, we can efficiently and accurately evaluate the value of individual data points, leading to improved training outcomes and ultimately enhancing the effectiveness of artificial intelligence applications.

Related papers

Fast-DataShapley: Neural Modeling for Training Data Valuation [40.630258021732544]
We propose Fast-DataShapley, a one-pass training method to train a reusable explainer model with real-time reasoning speed.<n>Given new test samples, no retraining is required to calculate the Shapley values of the training data.
arXiv Detail & Related papers (2025-06-05T17:35:46Z)
Fast Evaluation of DNN for Past Dataset in Incremental Learning [0.0]
We propose a new method to quickly evaluate the effect on the accuracy for the past dataset. The gradient of the parameter values for the past dataset is extracted by running the DNN before the training. The proposed method can estimate the accuracy change by additional training in a constant time.
arXiv Detail & Related papers (2024-05-10T07:55:08Z)
Efficient Data Shapley for Weighted Nearest Neighbor Algorithms [47.62605581521535]
WKNN-Shapley is an efficient computation of Data Shapley for weighted $K$ nearest neighbor algorithm (WKNN-Shapley) We show WKNN-Shapley's computational efficiency and its superior performance in discerning data quality compared to its unweighted counterpart.
arXiv Detail & Related papers (2024-01-20T03:34:18Z)
Fast Shapley Value Estimation: A Unified Approach [71.92014859992263]
We propose a straightforward and efficient Shapley estimator, SimSHAP, by eliminating redundant techniques. In our analysis of existing approaches, we observe that estimators can be unified as a linear transformation of randomly summed values from feature subsets. Our experiments validate the effectiveness of our SimSHAP, which significantly accelerates the computation of accurate Shapley values.
arXiv Detail & Related papers (2023-11-02T06:09:24Z)
LAVA: Data Valuation without Pre-Specified Learning Algorithms [20.578106028270607]
We introduce a new framework that can value training data in a way that is oblivious to the downstream learning algorithm. We develop a proxy for the validation performance associated with a training set based on a non-conventional class-wise Wasserstein distance between training and validation sets. We show that the distance characterizes the upper bound of the validation performance for any given model under certain Lipschitz conditions.
arXiv Detail & Related papers (2023-04-28T19:05:16Z)
A Note on "Efficient Task-Specific Data Valuation for Nearest Neighbor Algorithms" [18.65808473565554]
Jia et al. showed that for K-Nearest Neighbors (KNN) models, the algorithm of Data Shapley is surprisingly simple and efficient. We propose a more natural and interpretable utility function that better reflects the performance of KNN models. Our new approach, dubbed soft-label KNNSV, achieves the same time as the original method.
arXiv Detail & Related papers (2023-04-09T15:31:53Z)
Improved Algorithms for Neural Active Learning [74.89097665112621]
We improve the theoretical and empirical performance of neural-network(NN)-based active learning algorithms for the non-parametric streaming setting. We introduce two regret metrics by minimizing the population loss that are more suitable in active learning than the one used in state-of-the-art (SOTA) related work.
arXiv Detail & Related papers (2022-10-02T05:03:38Z)
Recurrent Bilinear Optimization for Binary Neural Networks [58.972212365275595]
BNNs neglect the intrinsic bilinear relationship of real-valued weights and scale factors. Our work is the first attempt to optimize BNNs from the bilinear perspective. We obtain robust RBONNs, which show impressive performance over state-of-the-art BNNs on various models and datasets.
arXiv Detail & Related papers (2022-09-04T06:45:33Z)
Rank-R FNN: A Tensor-Based Learning Model for High-Order Data Classification [69.26747803963907]
Rank-R Feedforward Neural Network (FNN) is a tensor-based nonlinear learning model that imposes Canonical/Polyadic decomposition on its parameters. First, it handles inputs as multilinear arrays, bypassing the need for vectorization, and can thus fully exploit the structural information along every data dimension. We establish the universal approximation and learnability properties of Rank-R FNN, and we validate its performance on real-world hyperspectral datasets.
arXiv Detail & Related papers (2021-04-11T16:37:32Z)
Unsupervised Learning of slow features for Data Efficient Regression [15.73372211126635]
We propose the slow variational autoencoder (S-VAE), an extension to the $beta$-VAE which applies a temporal similarity constraint to the latent representations. We evaluate the three methods against their data-efficiency on down-stream tasks using a synthetic 2D ball tracking dataset, a dataset from a reinforcent learning environment and a dataset generated using the DeepMind Lab environment.
arXiv Detail & Related papers (2020-12-11T12:19:45Z)
Self-Directed Online Machine Learning for Topology Optimization [58.920693413667216]
Self-directed Online Learning Optimization integrates Deep Neural Network (DNN) with Finite Element Method (FEM) calculations. Our algorithm was tested by four types of problems including compliance minimization, fluid-structure optimization, heat transfer enhancement and truss optimization. It reduced the computational time by 2 5 orders of magnitude compared with directly using methods, and outperformed all state-of-the-art algorithms tested in our experiments.
arXiv Detail & Related papers (2020-02-04T20:00:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.