Optimizing Data Shapley Interaction Calculation from O(2^n) to O(t n^2)
for KNN models
- URL: http://arxiv.org/abs/2304.01224v1
- Date: Sun, 2 Apr 2023 06:15:19 GMT
- Title: Optimizing Data Shapley Interaction Calculation from O(2^n) to O(t n^2)
for KNN models
- Authors: Mohamed Karim Belaid, Dorra El Mekki, Maximilian Rabus, Eyke
H\"ullermeier
- Abstract summary: "STI-KNN" is an innovative algorithm that calculates the exact pair-interaction Shapley values for KNN models in O(t n2) time.
By using STI-KNN, we can efficiently and accurately evaluate the value of individual data points, leading to improved training outcomes and ultimately enhancing the effectiveness of artificial intelligence applications.
- Score: 2.365702128814616
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: With the rapid growth of data availability and usage, quantifying the added
value of each training data point has become a crucial process in the field of
artificial intelligence. The Shapley values have been recognized as an
effective method for data valuation, enabling efficient training set
summarization, acquisition, and outlier removal. In this paper, we introduce
"STI-KNN", an innovative algorithm that calculates the exact pair-interaction
Shapley values for KNN models in O(t n^2) time, which is a significant
improvement over the O(2^n)$ time complexity of baseline methods. By using
STI-KNN, we can efficiently and accurately evaluate the value of individual
data points, leading to improved training outcomes and ultimately enhancing the
effectiveness of artificial intelligence applications.
Related papers
- Fast Evaluation of DNN for Past Dataset in Incremental Learning [0.0]
We propose a new method to quickly evaluate the effect on the accuracy for the past dataset.
The gradient of the parameter values for the past dataset is extracted by running the DNN before the training.
The proposed method can estimate the accuracy change by additional training in a constant time.
arXiv Detail & Related papers (2024-05-10T07:55:08Z) - Efficient Data Shapley for Weighted Nearest Neighbor Algorithms [47.62605581521535]
WKNN-Shapley is an efficient computation of Data Shapley for weighted $K$ nearest neighbor algorithm (WKNN-Shapley)
We show WKNN-Shapley's computational efficiency and its superior performance in discerning data quality compared to its unweighted counterpart.
arXiv Detail & Related papers (2024-01-20T03:34:18Z) - Fast Shapley Value Estimation: A Unified Approach [71.92014859992263]
We propose a straightforward and efficient Shapley estimator, SimSHAP, by eliminating redundant techniques.
In our analysis of existing approaches, we observe that estimators can be unified as a linear transformation of randomly summed values from feature subsets.
Our experiments validate the effectiveness of our SimSHAP, which significantly accelerates the computation of accurate Shapley values.
arXiv Detail & Related papers (2023-11-02T06:09:24Z) - LAVA: Data Valuation without Pre-Specified Learning Algorithms [20.578106028270607]
We introduce a new framework that can value training data in a way that is oblivious to the downstream learning algorithm.
We develop a proxy for the validation performance associated with a training set based on a non-conventional class-wise Wasserstein distance between training and validation sets.
We show that the distance characterizes the upper bound of the validation performance for any given model under certain Lipschitz conditions.
arXiv Detail & Related papers (2023-04-28T19:05:16Z) - A Note on "Efficient Task-Specific Data Valuation for Nearest Neighbor
Algorithms" [18.65808473565554]
Jia et al. showed that for K-Nearest Neighbors (KNN) models, the algorithm of Data Shapley is surprisingly simple and efficient.
We propose a more natural and interpretable utility function that better reflects the performance of KNN models.
Our new approach, dubbed soft-label KNNSV, achieves the same time as the original method.
arXiv Detail & Related papers (2023-04-09T15:31:53Z) - Improved Algorithms for Neural Active Learning [74.89097665112621]
We improve the theoretical and empirical performance of neural-network(NN)-based active learning algorithms for the non-parametric streaming setting.
We introduce two regret metrics by minimizing the population loss that are more suitable in active learning than the one used in state-of-the-art (SOTA) related work.
arXiv Detail & Related papers (2022-10-02T05:03:38Z) - Recurrent Bilinear Optimization for Binary Neural Networks [58.972212365275595]
BNNs neglect the intrinsic bilinear relationship of real-valued weights and scale factors.
Our work is the first attempt to optimize BNNs from the bilinear perspective.
We obtain robust RBONNs, which show impressive performance over state-of-the-art BNNs on various models and datasets.
arXiv Detail & Related papers (2022-09-04T06:45:33Z) - Rank-R FNN: A Tensor-Based Learning Model for High-Order Data
Classification [69.26747803963907]
Rank-R Feedforward Neural Network (FNN) is a tensor-based nonlinear learning model that imposes Canonical/Polyadic decomposition on its parameters.
First, it handles inputs as multilinear arrays, bypassing the need for vectorization, and can thus fully exploit the structural information along every data dimension.
We establish the universal approximation and learnability properties of Rank-R FNN, and we validate its performance on real-world hyperspectral datasets.
arXiv Detail & Related papers (2021-04-11T16:37:32Z) - Self-Directed Online Machine Learning for Topology Optimization [58.920693413667216]
Self-directed Online Learning Optimization integrates Deep Neural Network (DNN) with Finite Element Method (FEM) calculations.
Our algorithm was tested by four types of problems including compliance minimization, fluid-structure optimization, heat transfer enhancement and truss optimization.
It reduced the computational time by 2 5 orders of magnitude compared with directly using methods, and outperformed all state-of-the-art algorithms tested in our experiments.
arXiv Detail & Related papers (2020-02-04T20:00:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.