Kernel Banzhaf: A Fast and Robust Estimator for Banzhaf Values
- URL: http://arxiv.org/abs/2410.08336v2
- Date: Tue, 18 Feb 2025 04:11:52 GMT
- Title: Kernel Banzhaf: A Fast and Robust Estimator for Banzhaf Values
- Authors: Yurong Liu, R. Teal Witter, Flip Korn, Tarfah Alrashed, Dimitris Paparas, Christopher Musco, Juliana Freire,
- Abstract summary: We introduce Kernel Banzhaf, the first regression-based estimator for Banzhaf values.
We find that Kernel Banzhaf significantly outperforms existing Monte Carlo methods in terms of accuracy, sample efficiency, to noise, and feature ranking recovery.
- Score: 17.97990216632801
- License:
- Abstract: Banzhaf values provide a popular, interpretable alternative to the widely-used Shapley values for quantifying the importance of features in machine learning models. Like Shapley values, computing Banzhaf values exactly requires time exponential in the number of features, necessitating the use of efficient estimators. Existing estimators, however, are limited to Monte Carlo sampling methods. In this work, we introduce Kernel Banzhaf, the first regression-based estimator for Banzhaf values. Our approach leverages a novel regression formulation, whose exact solution corresponds to the exact Banzhaf values. Inspired by the success of Kernel SHAP for Shapley values, Kernel Banzhaf efficiently solves a sampled instance of this regression problem. Through empirical evaluations across eight datasets, we find that Kernel Banzhaf significantly outperforms existing Monte Carlo methods in terms of accuracy, sample efficiency, robustness to noise, and feature ranking recovery. Finally, we complement our experimental evaluation with strong theoretical guarantees on Kernel Banzhaf's performance.
Related papers
- Relaxed Quantile Regression: Prediction Intervals for Asymmetric Noise [51.87307904567702]
Quantile regression is a leading approach for obtaining such intervals via the empirical estimation of quantiles in the distribution of outputs.
We propose Relaxed Quantile Regression (RQR), a direct alternative to quantile regression based interval construction that removes this arbitrary constraint.
We demonstrate that this added flexibility results in intervals with an improvement in desirable qualities.
arXiv Detail & Related papers (2024-06-05T13:36:38Z) - Fast Shapley Value Estimation: A Unified Approach [71.92014859992263]
We propose a straightforward and efficient Shapley estimator, SimSHAP, by eliminating redundant techniques.
In our analysis of existing approaches, we observe that estimators can be unified as a linear transformation of randomly summed values from feature subsets.
Our experiments validate the effectiveness of our SimSHAP, which significantly accelerates the computation of accurate Shapley values.
arXiv Detail & Related papers (2023-11-02T06:09:24Z) - Equation Discovery with Bayesian Spike-and-Slab Priors and Efficient Kernels [57.46832672991433]
We propose a novel equation discovery method based on Kernel learning and BAyesian Spike-and-Slab priors (KBASS)
We use kernel regression to estimate the target function, which is flexible, expressive, and more robust to data sparsity and noises.
We develop an expectation-propagation expectation-maximization algorithm for efficient posterior inference and function estimation.
arXiv Detail & Related papers (2023-10-09T03:55:09Z) - Consensus-Adaptive RANSAC [104.87576373187426]
We propose a new RANSAC framework that learns to explore the parameter space by considering the residuals seen so far via a novel attention layer.
The attention mechanism operates on a batch of point-to-model residuals, and updates a per-point estimation state to take into account the consensus found through a lightweight one-step transformer.
arXiv Detail & Related papers (2023-07-26T08:25:46Z) - Efficient Shapley Values Estimation by Amortization for Text
Classification [66.7725354593271]
We develop an amortized model that directly predicts each input feature's Shapley Value without additional model evaluations.
Experimental results on two text classification datasets demonstrate that our amortized model estimates Shapley Values accurately with up to 60 times speedup.
arXiv Detail & Related papers (2023-05-31T16:19:13Z) - Functional Ensemble Distillation [18.34081591772928]
We investigate how to best distill an ensemble's predictions using an efficient model.
We find that learning the distilled model via a simple augmentation scheme in the form of mixup augmentation significantly boosts the performance.
arXiv Detail & Related papers (2022-06-05T14:07:17Z) - Data Banzhaf: A Robust Data Valuation Framework for Machine Learning [18.65808473565554]
This paper studies the robustness of data valuation to noisy model performance scores.
We introduce the concept of safety margin, which measures the robustness of a data value notion.
We show that the Banzhaf value achieves the largest safety margin among all semivalues.
arXiv Detail & Related papers (2022-05-30T23:44:09Z) - Importance Weighting Approach in Kernel Bayes' Rule [43.221685127485735]
We study a nonparametric approach to Bayesian computation via feature means, where the expectation of prior features is updated to yield expected posterior features.
All quantities involved in the Bayesian update are learned from observed data, making the method entirely model-free.
Our approach is based on importance weighting, which results in superior numerical stability to the existing approach to KBR.
arXiv Detail & Related papers (2022-02-05T03:06:59Z) - Evaluating State-of-the-Art Classification Models Against Bayes
Optimality [106.50867011164584]
We show that we can compute the exact Bayes error of generative models learned using normalizing flows.
We use our approach to conduct a thorough investigation of state-of-the-art classification models.
arXiv Detail & Related papers (2021-06-07T06:21:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.