Related papers: Efficient Data Shapley for Weighted Nearest Neighbor Algorithms

Efficient Data Shapley for Weighted Nearest Neighbor Algorithms

URL: http://arxiv.org/abs/2401.11103v1
Date: Sat, 20 Jan 2024 03:34:18 GMT
Title: Efficient Data Shapley for Weighted Nearest Neighbor Algorithms
Authors: Jiachen T. Wang, Prateek Mittal, and Ruoxi Jia
Abstract summary: WKNN-Shapley is an efficient computation of Data Shapley for weighted $K$ nearest neighbor algorithm (WKNN-Shapley) We show WKNN-Shapley's computational efficiency and its superior performance in discerning data quality compared to its unweighted counterpart.
Score: 47.62605581521535
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This work aims to address an open problem in data valuation literature concerning the efficient computation of Data Shapley for weighted $K$ nearest neighbor algorithm (WKNN-Shapley). By considering the accuracy of hard-label KNN with discretized weights as the utility function, we reframe the computation of WKNN-Shapley into a counting problem and introduce a quadratic-time algorithm, presenting a notable improvement from $O(N^K)$, the best result from existing literature. We develop a deterministic approximation algorithm that further improves computational efficiency while maintaining the key fairness properties of the Shapley value. Through extensive experiments, we demonstrate WKNN-Shapley's computational efficiency and its superior performance in discerning data quality compared to its unweighted counterpart.

Related papers

Adaptive Resolving Methods for Reinforcement Learning with Function Approximations [4.168629519090361]
We develop a new algorithm to solve theReinforcement learning problems with function approximation.<n>Our algorithm is based on the linear programming (LP) reformulation and it resolves the LP at each improved with new data arrival.<n>In comparison to the $O(1/sqrtN)$ worst-case guarantee established in the previous literature, our instance-dependent guarantee is tighter when the underlying instance is favorable.
arXiv Detail & Related papers (2025-05-17T14:59:15Z)
An Efficient Outlier Detection Algorithm for Data Streaming [51.56874851156008]
Traditional outlier detection methods, such as the Local Outlier Factor (LOF) algorithm, struggle with real-time data. We propose a novel approach to enhance the efficiency of LOF algorithms for online anomaly detection, named the Efficient Incremental LOF (EILOF) algorithm. The EILOF algorithm not only significantly reduces computational costs, but also systematically improves detection accuracy when the number of additional points increases.
arXiv Detail & Related papers (2025-01-02T05:12:43Z)
Fast Shapley Value Estimation: A Unified Approach [71.92014859992263]
We propose a straightforward and efficient Shapley estimator, SimSHAP, by eliminating redundant techniques. In our analysis of existing approaches, we observe that estimators can be unified as a linear transformation of randomly summed values from feature subsets. Our experiments validate the effectiveness of our SimSHAP, which significantly accelerates the computation of accurate Shapley values.
arXiv Detail & Related papers (2023-11-02T06:09:24Z)
High-dimensional Contextual Bandit Problem without Sparsity [8.782204980889077]
We propose an explore-then-commit (EtC) algorithm to address this problem and examine its performance. We derive the optimal rate of the ETC algorithm in terms of $T$ and show that this rate can be achieved by balancing exploration and exploitation. We introduce an adaptive explore-then-commit (AEtC) algorithm that adaptively finds the optimal balance.
arXiv Detail & Related papers (2023-06-19T15:29:32Z)
A Note on "Efficient Task-Specific Data Valuation for Nearest Neighbor Algorithms" [18.65808473565554]
Jia et al. showed that for K-Nearest Neighbors (KNN) models, the algorithm of Data Shapley is surprisingly simple and efficient. We propose a more natural and interpretable utility function that better reflects the performance of KNN models. Our new approach, dubbed soft-label KNNSV, achieves the same time as the original method.
arXiv Detail & Related papers (2023-04-09T15:31:53Z)
Optimizing Data Shapley Interaction Calculation from O(2^n) to O(t n^2) for KNN models [2.365702128814616]
"STI-KNN" is an innovative algorithm that calculates the exact pair-interaction Shapley values for KNN models in O(t n2) time. By using STI-KNN, we can efficiently and accurately evaluate the value of individual data points, leading to improved training outcomes and ultimately enhancing the effectiveness of artificial intelligence applications.
arXiv Detail & Related papers (2023-04-02T06:15:19Z)
Linearized Wasserstein dimensionality reduction with approximation guarantees [65.16758672591365]
LOT Wassmap is a computationally feasible algorithm to uncover low-dimensional structures in the Wasserstein space. We show that LOT Wassmap attains correct embeddings and that the quality improves with increased sample size. We also show how LOT Wassmap significantly reduces the computational cost when compared to algorithms that depend on pairwise distance computations.
arXiv Detail & Related papers (2023-02-14T22:12:16Z)
Efficient Dataset Distillation Using Random Feature Approximation [109.07737733329019]
We propose a novel algorithm that uses a random feature approximation (RFA) of the Neural Network Gaussian Process (NNGP) kernel. Our algorithm provides at least a 100-fold speedup over KIP and can run on a single GPU. Our new method, termed an RFA Distillation (RFAD), performs competitively with KIP and other dataset condensation algorithms in accuracy over a range of large-scale datasets.
arXiv Detail & Related papers (2022-10-21T15:56:13Z)
Improved Algorithms for Neural Active Learning [74.89097665112621]
We improve the theoretical and empirical performance of neural-network(NN)-based active learning algorithms for the non-parametric streaming setting. We introduce two regret metrics by minimizing the population loss that are more suitable in active learning than the one used in state-of-the-art (SOTA) related work.
arXiv Detail & Related papers (2022-10-02T05:03:38Z)
Accelerating Shapley Explanation via Contributive Cooperator Selection [42.11059072201565]
We propose a novel method SHEAR to significantly accelerate the Shapley explanation for DNN models. The selection of the feature coalitions follows our proposed Shapley chain rule to minimize the absolute error from the ground-truth Shapley values. SHEAR consistently outperforms state-of-the-art baseline methods across different evaluation metrics.
arXiv Detail & Related papers (2022-06-17T03:24:45Z)
Differentially Private Shapley Values for Data Evaluation [3.616258473002814]
Shapley values are computationally expensive and involve the entire dataset. We propose a new stratified approximation method called the Layered Shapley Algorithm. We prove that this method operates on small (O(polylog(n))) random samples of data and small sized ($O(log n)$) coalitions to achieve the results with guaranteed probabilistic accuracy.
arXiv Detail & Related papers (2022-06-01T14:14:24Z)
Channel Assignment in Uplink Wireless Communication using Machine Learning Approach [54.012791474906514]
This letter investigates a channel assignment problem in uplink wireless communication systems. Our goal is to maximize the sum rate of all users subject to integer channel assignment constraints. Due to high computational complexity, machine learning approaches are employed to obtain computational efficient solutions.
arXiv Detail & Related papers (2020-01-12T15:54:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.