Related papers: Pushing the Boundaries of Interpretability: Incremental Enhancements to the Explainable Boosting Machine

Pushing the Boundaries of Interpretability: Incremental Enhancements to the Explainable Boosting Machine

URL: http://arxiv.org/abs/2512.00528v1
Date: Sat, 29 Nov 2025 15:46:13 GMT
Title: Pushing the Boundaries of Interpretability: Incremental Enhancements to the Explainable Boosting Machine
Authors: Isara Liyanage, Uthayasanker Thayasivam,
Abstract summary: This paper aims at improving the Explainable Boosting Machine (EBM), a state-of-the-art glassbox model that delivers both high accuracy and complete transparency.<n>The work is positioned as a critical step toward developing machine learning systems that are robust, equitable, and transparent.
Score: 1.2461503242570642
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The widespread adoption of complex machine learning models in high-stakes domains has brought the "black-box" problem to the forefront of responsible AI research. This paper aims at addressing this issue by improving the Explainable Boosting Machine (EBM), a state-of-the-art glassbox model that delivers both high accuracy and complete transparency. The paper outlines three distinct enhancement methodologies: targeted hyperparameter optimization with Bayesian methods, the implementation of a custom multi-objective function for fairness for hyperparameter optimization, and a novel self-supervised pre-training pipeline for cold-start scenarios. All three methodologies are evaluated across standard benchmark datasets, including the Adult Income, Credit Card Fraud Detection, and UCI Heart Disease datasets. The analysis indicates that while the tuning process yielded marginal improvements in the primary ROC AUC metric, it led to a subtle but important shift in the model's decision-making behavior, demonstrating the value of a multi-faceted evaluation beyond a single performance score. This work is positioned as a critical step toward developing machine learning systems that are not only accurate but also robust, equitable, and transparent, meeting the growing demands of regulatory and ethical compliance.

Related papers

Improving Credit Card Fraud Detection with an Optimized Explainable Boosting Machine [0.0]
The study proposes an enhanced workflow based on the Explainable Boosting Machine (EBM)<n>The optimized EBM achieves an effective balance between accuracy and interpretability, enabling precise detection of fraudulent transactions.<n> Experimental evaluation on benchmark credit card data yields an ROC-AUC of 0.983, surpassing prior EBM baselines.
arXiv Detail & Related papers (2026-02-06T18:56:17Z)
o-MEGA: Optimized Methods for Explanation Generation and Analysis [1.4113638641649384]
textbftexttto-mega is a tool designed to automatically identify the most effective explainable AI methods.<n>We demonstrate improved transparency in automated fact-checking systems.
arXiv Detail & Related papers (2025-09-30T21:08:36Z)
Explicit modelling of subject dependency in BCI decoding [12.17288254938554]
Brain-Computer Interfaces (BCIs) suffer from high inter-subject variability and limited labeled data.<n>We present an end-to-end approach that explicitly models the subject dependency using lightweight convolutional neural networks (CNNs) conditioned on the subject's identity.
arXiv Detail & Related papers (2025-09-27T10:51:42Z)
Rethinking Evaluation of Infrared Small Target Detection [105.59753496831739]
This paper introduces a hybrid-level metric incorporating pixel- and target-level performance, proposing a systematic error analysis method, and emphasizing the importance of cross-dataset evaluation.<n>An open-source toolkit has be released to facilitate standardized benchmarking.
arXiv Detail & Related papers (2025-09-21T02:45:07Z)
RoHOI: Robustness Benchmark for Human-Object Interaction Detection [84.78366452133514]
Human-Object Interaction (HOI) detection is crucial for robot-human assistance, enabling context-aware support.<n>We introduce the first benchmark for HOI detection, evaluating model resilience under diverse challenges.<n>Our benchmark, RoHOI, includes 20 corruption types based on the HICO-DET and V-COCO datasets and a new robustness-focused metric.
arXiv Detail & Related papers (2025-07-12T01:58:04Z)
Explainable Artificial Intelligence Credit Risk Assessment using Machine Learning [0.0]
This paper presents an AI-driven system for Credit Risk Assessment using three state-of-the-art ensemble machine learning models combined with Explainable AI (XAI) techniques.<n>The system leverages XGBoost, LightGBM, and Random Forest algorithms for predictive analysis of loan default risks.<n>LightGBM emerges as the most business-optimal model with the highest accuracy and best trade off between approval and default rates.
arXiv Detail & Related papers (2025-06-24T07:20:05Z)
SPARE: Single-Pass Annotation with Reference-Guided Evaluation for Automatic Process Supervision and Reward Modelling [58.05959902776133]
We introduce Single-Pass.<n>with Reference-Guided Evaluation (SPARE), a novel structured framework that enables efficient per-step annotation.<n>We demonstrate SPARE's effectiveness across four diverse datasets spanning mathematical reasoning (GSM8K, MATH), multi-hop question answering (MuSiQue-Ans), and spatial reasoning (SpaRP)<n>On ProcessBench, SPARE demonstrates data-efficient out-of-distribution generalization, using only $sim$16% of training samples compared to human-labeled and other synthetically trained baselines.
arXiv Detail & Related papers (2025-06-18T14:37:59Z)
On the Role of Feedback in Test-Time Scaling of Agentic AI Workflows [71.92083784393418]
Agentic AI (systems that autonomously plan and act) are becoming widespread, yet their task success rate on complex tasks remains low.<n>Inference-time alignment relies on three components: sampling, evaluation, and feedback.<n>We introduce Iterative Agent Decoding (IAD), a procedure that repeatedly inserts feedback extracted from different forms of critiques.
arXiv Detail & Related papers (2025-04-02T17:40:47Z)
The Dual-use Dilemma in LLMs: Do Empowering Ethical Capacities Make a Degraded Utility? [54.18519360412294]
Large Language Models (LLMs) must balance between rejecting harmful requests for safety and accommodating legitimate ones for utility.<n>This paper presents a Direct Preference Optimization (DPO) based alignment framework that achieves better overall performance.<n>We analyze experimental results obtained from testing DeepSeek-R1 on our benchmark and reveal the critical ethical concerns raised by this highly acclaimed model.
arXiv Detail & Related papers (2025-01-20T06:35:01Z)
QualEval: Qualitative Evaluation for Model Improvement [82.73561470966658]
We propose QualEval, which augments quantitative scalar metrics with automated qualitative evaluation as a vehicle for model improvement. QualEval uses a powerful LLM reasoner and our novel flexible linear programming solver to generate human-readable insights. We demonstrate that leveraging its insights, for example, improves the absolute performance of the Llama 2 model by up to 15% points relative.
arXiv Detail & Related papers (2023-11-06T00:21:44Z)
End-to-End Meta-Bayesian Optimisation with Transformer Neural Processes [52.818579746354665]
This paper proposes the first end-to-end differentiable meta-BO framework that generalises neural processes to learn acquisition functions via transformer architectures. We enable this end-to-end framework with reinforcement learning (RL) to tackle the lack of labelled acquisition data.
arXiv Detail & Related papers (2023-05-25T10:58:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.