Related papers: Model Reconstruction Using Counterfactual Explanations: Mitigating the Decision Boundary Shift

Model Reconstruction Using Counterfactual Explanations: Mitigating the Decision Boundary Shift

URL: http://arxiv.org/abs/2405.05369v1
Date: Wed, 8 May 2024 18:52:47 GMT
Title: Model Reconstruction Using Counterfactual Explanations: Mitigating the Decision Boundary Shift
Authors: Pasan Dissanayake, Sanghamitra Dutta,
Abstract summary: We propose a novel strategy for model extraction that we call Counterfactual Clamping Attack (CCA) We derive novel mathematical relationships between the error in model approximation and the number of queries using polytope theory. Experimental results demonstrate that our strategy provides improved fidelity between the target and surrogate model predictions on several real world datasets.
Score: 9.771997770574947
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Counterfactual explanations find ways of achieving a favorable model outcome with minimum input perturbation. However, counterfactual explanations can also be exploited to steal the model by strategically training a surrogate model to give similar predictions as the original (target) model. In this work, we investigate model extraction by specifically leveraging the fact that the counterfactual explanations also lie quite close to the decision boundary. We propose a novel strategy for model extraction that we call Counterfactual Clamping Attack (CCA) which trains a surrogate model using a unique loss function that treats counterfactuals differently than ordinary instances. Our approach also alleviates the related problem of decision boundary shift that arises in existing model extraction attacks which treat counterfactuals as ordinary instances. We also derive novel mathematical relationships between the error in model approximation and the number of queries using polytope theory. Experimental results demonstrate that our strategy provides improved fidelity between the target and surrogate model predictions on several real world datasets.

Related papers

Model Steering: Learning with a Reference Model Improves Generalization Bounds and Scaling Laws [52.10468229008941]
This paper formalizes an emerging learning paradigm that uses a trained model as a reference to guide and enhance the training of a target model through strategic data selection or weighting.<n>We provide theoretical insights into why this approach improves generalization and data efficiency compared to training without a reference model.<n>Building on these insights, we introduce a novel method for Contrastive Language-Image Pretraining with a reference model, termed DRRho-CLIP.
arXiv Detail & Related papers (2025-05-10T16:55:03Z)
Constructing Concept-based Models to Mitigate Spurious Correlations with Minimal Human Effort [31.992947353231564]
Concept Bottleneck Models (CBMs) can provide a principled way of disclosing and guiding model behaviors through human-understandable concepts. We propose a novel framework designed to exploit pre-trained models while being immune to these biases, thereby reducing vulnerability to spurious correlations. We evaluate the proposed method on multiple datasets, and the results demonstrate its effectiveness in reducing model reliance on spurious correlations while preserving its interpretability.
arXiv Detail & Related papers (2024-07-12T03:07:28Z)
Towards Characterizing Domain Counterfactuals For Invertible Latent Causal Models [15.817239008727789]
In this work, we analyze a specific type of causal query called domain counterfactuals, which hypothesizes what a sample would have looked like if it had been generated in a different domain. We show that recovering the latent Structural Causal Model (SCM) is unnecessary for estimating domain counterfactuals. We also develop a theoretically grounded practical algorithm that simplifies the modeling process to generative model estimation.
arXiv Detail & Related papers (2023-06-20T04:19:06Z)
Advancing Counterfactual Inference through Nonlinear Quantile Regression [77.28323341329461]
We propose a framework for efficient and effective counterfactual inference implemented with neural networks. The proposed approach enhances the capacity to generalize estimated counterfactual outcomes to unseen data. Empirical results conducted on multiple datasets offer compelling support for our theoretical assertions.
arXiv Detail & Related papers (2023-06-09T08:30:51Z)
Probabilistic Traffic Forecasting with Dynamic Regression [15.31488551912888]
This paper proposes a dynamic regression (DR) framework that enhances existing deeptemporal models by incorporating for learning the error process in traffic forecasting. The framework relaxes the assumption of time independence by modeling the error series of the base model using a matrix- structured autoregressive (AR) model. The newly designed loss function is based on the likelihood of a non-isotropic error term, enabling the model to generate probabilistic forecasts while preserving the original outputs of the base model.
arXiv Detail & Related papers (2023-01-17T01:12:44Z)
When to Update Your Model: Constrained Model-based Reinforcement Learning [50.74369835934703]
We propose a novel and general theoretical scheme for a non-decreasing performance guarantee of model-based RL (MBRL) Our follow-up derived bounds reveal the relationship between model shifts and performance improvement. A further example demonstrates that learning models from a dynamically-varying number of explorations benefit the eventual returns.
arXiv Detail & Related papers (2022-10-15T17:57:43Z)
MINIMALIST: Mutual INformatIon Maximization for Amortized Likelihood Inference from Sampled Trajectories [61.3299263929289]
Simulation-based inference enables learning the parameters of a model even when its likelihood cannot be computed in practice. One class of methods uses data simulated with different parameters to infer an amortized estimator for the likelihood-to-evidence ratio. We show that this approach can be formulated in terms of mutual information between model parameters and simulated data.
arXiv Detail & Related papers (2021-06-03T12:59:16Z)
Beyond Trivial Counterfactual Explanations with Diverse Valuable Explanations [64.85696493596821]
In computer vision applications, generative counterfactual methods indicate how to perturb a model's input to change its prediction. We propose a counterfactual method that learns a perturbation in a disentangled latent space that is constrained using a diversity-enforcing loss. Our model improves the success rate of producing high-quality valuable explanations when compared to previous state-of-the-art methods.
arXiv Detail & Related papers (2021-03-18T12:57:34Z)
Improving the Reconstruction of Disentangled Representation Learners via Multi-Stage Modeling [54.94763543386523]
Current autoencoder-based disentangled representation learning methods achieve disentanglement by penalizing the ( aggregate) posterior to encourage statistical independence of the latent factors. We present a novel multi-stage modeling approach where the disentangled factors are first learned using a penalty-based disentangled representation learning method. Then, the low-quality reconstruction is improved with another deep generative model that is trained to model the missing correlated latent variables.
arXiv Detail & Related papers (2020-10-25T18:51:15Z)
Control as Hybrid Inference [62.997667081978825]
We present an implementation of CHI which naturally mediates the balance between iterative and amortised inference. We verify the scalability of our algorithm on a continuous control benchmark, demonstrating that it outperforms strong model-free and model-based baselines.
arXiv Detail & Related papers (2020-07-11T19:44:09Z)
Bidirectional Model-based Policy Optimization [30.732572976324516]
Model-based reinforcement learning approaches leverage a forward dynamics model to support planning and decision making. In this paper, we propose to additionally construct a backward dynamics model to reduce the reliance on accuracy in forward model predictions. We develop a novel method, called Bidirectional Model-based Policy (BMPO), to utilize both the forward model and backward model to generate short branched rollouts for policy optimization.
arXiv Detail & Related papers (2020-07-04T03:34:09Z)
Model Repair: Robust Recovery of Over-Parameterized Statistical Models [24.319310729283636]
A new type of robust estimation problem is introduced where the goal is to recover a statistical model that has been corrupted after it has been estimated from data. Methods are proposed for "repairing" the model using only the design and not the response values used to fit the model in a supervised learning setting.
arXiv Detail & Related papers (2020-05-20T08:41:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.