Related papers: Controlled Model Debiasing through Minimal and Interpretable Updates

Controlled Model Debiasing through Minimal and Interpretable Updates

URL: http://arxiv.org/abs/2502.21284v1
Date: Fri, 28 Feb 2025 18:03:55 GMT
Title: Controlled Model Debiasing through Minimal and Interpretable Updates
Authors: Federico Di Gennaro, Thibault Laugel, Vincent Grari, Marcin Detyniecki,
Abstract summary: We introduce the notion of controlled model debiasing, a novel supervised learning task relying on two desideratas.<n>We introduce a novel algorithm for algorithmic fairness, COMMOD, that is both model-agnostic and does not require the sensitive attribute at test time.<n>Our approach combines a concept-based architecture and adversarial learning and we demonstrate through empirical results that it achieves comparable performance to state-of-the-art debiasing methods.
Score: 6.089774484591287
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Traditional approaches to learning fair machine learning models often require rebuilding models from scratch, generally without accounting for potentially existing previous models. In a context where models need to be retrained frequently, this can lead to inconsistent model updates, as well as redundant and costly validation testing. To address this limitation, we introduce the notion of controlled model debiasing, a novel supervised learning task relying on two desiderata: that the differences between new fair model and the existing one should be (i) interpretable and (ii) minimal. After providing theoretical guarantees to this new problem, we introduce a novel algorithm for algorithmic fairness, COMMOD, that is both model-agnostic and does not require the sensitive attribute at test time. In addition, our algorithm is explicitly designed to enforce minimal and interpretable changes between biased and debiased predictions -a property that, while highly desirable in high-stakes applications, is rarely prioritized as an explicit objective in fairness literature. Our approach combines a concept-based architecture and adversarial learning and we demonstrate through empirical results that it achieves comparable performance to state-of-the-art debiasing methods while performing minimal and interpretable prediction changes.

Related papers

LoRE-Merging: Exploring Low-Rank Estimation For Large Language Model Merging [10.33844295243509]
We propose a unified framework for model merging based on low-rank estimation of task vectors without the need for access to the base model, named textscLoRE-Merging. Our approach is motivated by the observation that task vectors from fine-tuned models frequently exhibit a limited number of dominant singular values, making low-rank estimations less prone to interference.
arXiv Detail & Related papers (2025-02-15T10:18:46Z)
Predictive Churn with the Set of Good Models [64.05949860750235]
We study the effect of conflicting predictions over the set of near-optimal machine learning models. We present theoretical results on the expected churn between models within the Rashomon set. We show how our approach can be used to better anticipate, reduce, and avoid churn in consumer-facing applications.
arXiv Detail & Related papers (2024-02-12T16:15:25Z)
Surprisal Driven $k$-NN for Robust and Interpretable Nonparametric Learning [1.4293924404819704]
We shed new light on the traditional nearest neighbors algorithm from the perspective of information theory. We propose a robust and interpretable framework for tasks such as classification, regression, density estimation, and anomaly detection using a single model. Our work showcases the architecture's versatility by achieving state-of-the-art results in classification and anomaly detection.
arXiv Detail & Related papers (2023-11-17T00:35:38Z)
A PAC-Bayesian Perspective on the Interpolating Information Criterion [54.548058449535155]
We show how a PAC-Bayes bound is obtained for a general class of models, characterizing factors which influence performance in the interpolating regime. We quantify how the test error for overparameterized models achieving effectively zero training error depends on the quality of the implicit regularization imposed by e.g. the combination of model, parameter-initialization scheme.
arXiv Detail & Related papers (2023-11-13T01:48:08Z)
fairml: A Statistician's Take on Fair Machine Learning Modelling [0.0]
We describe the fairml package which implements our previous work (Scutari, Panero, and Proissl 2022) and related models in the literature. fairml is designed around classical statistical models and penalised regression results. The constraint used to enforce fairness is to model estimation, making it possible to mix-and-match the desired model family and fairness definition for each application.
arXiv Detail & Related papers (2023-05-03T09:59:53Z)
Deep Grey-Box Modeling With Adaptive Data-Driven Models Toward Trustworthy Estimation of Theory-Driven Models [88.63781315038824]
We present a framework that enables us to analyze a regularizer's behavior empirically with a slight change in the neural net's architecture and the training objective.
arXiv Detail & Related papers (2022-10-24T10:42:26Z)
When to Update Your Model: Constrained Model-based Reinforcement Learning [50.74369835934703]
We propose a novel and general theoretical scheme for a non-decreasing performance guarantee of model-based RL (MBRL) Our follow-up derived bounds reveal the relationship between model shifts and performance improvement. A further example demonstrates that learning models from a dynamically-varying number of explorations benefit the eventual returns.
arXiv Detail & Related papers (2022-10-15T17:57:43Z)
Rethinking and Recomputing the Value of Machine Learning Models [16.06614967567121]
We argue that the prevailing approach to training and evaluating machine learning models often fails to consider their real-world application. Traditional metrics like accuracy and f-score fail to capture the beneficial value of models in such hybrid settings. We introduce a simple yet theoretically sound "value" metric that incorporates task-specific costs for correct predictions, errors, and rejections.
arXiv Detail & Related papers (2022-09-30T01:02:31Z)
Fairness Reprogramming [42.65700878967251]
We propose a new generic fairness learning paradigm, called FairReprogram, which incorporates the model reprogramming technique. Specifically, FairReprogram considers the case where models can not be changed and appends to the input a set of perturbations, called the fairness trigger. We show both theoretically and empirically that the fairness trigger can effectively obscure demographic biases in the output prediction of fixed ML models.
arXiv Detail & Related papers (2022-09-21T09:37:00Z)
Learning from others' mistakes: Avoiding dataset biases without modeling them [111.17078939377313]
State-of-the-art natural language processing (NLP) models often learn to model dataset biases and surface form correlations instead of features that target the intended task. Previous work has demonstrated effective methods to circumvent these issues when knowledge of the bias is available. We show a method for training models that learn to ignore these problematic correlations.
arXiv Detail & Related papers (2020-12-02T16:10:54Z)
Generative Temporal Difference Learning for Infinite-Horizon Prediction [101.59882753763888]
We introduce the $gamma$-model, a predictive model of environment dynamics with an infinite probabilistic horizon. We discuss how its training reflects an inescapable tradeoff between training-time and testing-time compounding errors.
arXiv Detail & Related papers (2020-10-27T17:54:12Z)
Control as Hybrid Inference [62.997667081978825]
We present an implementation of CHI which naturally mediates the balance between iterative and amortised inference. We verify the scalability of our algorithm on a continuous control benchmark, demonstrating that it outperforms strong model-free and model-based baselines.
arXiv Detail & Related papers (2020-07-11T19:44:09Z)
SE3M: A Model for Software Effort Estimation Using Pre-trained Embedding Models [0.8287206589886881]
This paper proposes to evaluate the effectiveness of pre-trained embeddings models. Generic pre-trained models for both approaches went through a fine-tuning process. Results were very promising, realizing that pre-trained models can be used to estimate software effort based only on requirements texts.
arXiv Detail & Related papers (2020-06-30T14:15:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.