Related papers: XtracTree: a Simple and Effective Method for Regulator Validation of Bagging Methods Used in Retail Banking

XtracTree: a Simple and Effective Method for Regulator Validation of Bagging Methods Used in Retail Banking

URL: http://arxiv.org/abs/2004.02326v3
Date: Tue, 17 Aug 2021 14:31:33 GMT
Title: XtracTree: a Simple and Effective Method for Regulator Validation of Bagging Methods Used in Retail Banking
Authors: Jeremy Charlier and Vladimir Makarenkov
Abstract summary: We propose XtracTree, an algorithm capable of efficiently converting an ML bagging classifier, such as a random forest, into simple "if-then" rules. Our experiments demonstrate that using XtracTree, one can convert an ML model into a rule-based algorithm. The proposed approach allowed our banking institution to reduce up to 50% the time of delivery of our AI solutions to the end-user.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Bootstrap aggregation, known as bagging, is one of the most popular ensemble methods used in machine learning (ML). An ensemble method is a ML method that combines multiple hypotheses to form a single hypothesis used for prediction. A bagging algorithm combines multiple classifiers modeled on different sub-samples of the same data set to build one large classifier. Banks, and their retail banking activities, are nowadays using the power of ML algorithms, including decision trees and random forests, to optimize their processes. However, banks have to comply with regulators and governance and, hence, delivering effective ML solutions is a challenging task. It starts with the bank's validation and governance department, followed by the deployment of the solution in a production environment up to the external validation of the national financial regulator. Each proposed ML model has to be validated and clear rules for every algorithm-based decision must be justified. In this context, we propose XtracTree, an algorithm capable of efficiently converting an ML bagging classifier, such as a random forest, into simple "if-then" rules satisfying the requirements of model validation. We use a public loan data set from Kaggle to illustrate the usefulness of our approach. Our experiments demonstrate that using XtracTree, one can convert an ML model into a rule-based algorithm, leading to easier model validation by national financial regulators and the bank's validation department. The proposed approach allowed our banking institution to reduce up to 50% the time of delivery of our AI solutions to the end-user.

Related papers

Fast Controlled Generation from Language Models with Adaptive Weighted Rejection Sampling [90.86991492288487]
evaluating constraint on every token can be prohibitively expensive. LCD can distort the global distribution over strings, sampling tokens based only on local information. We show that our approach is superior to state-of-the-art baselines.
arXiv Detail & Related papers (2025-04-07T18:30:18Z)
When To Solve, When To Verify: Compute-Optimal Problem Solving and Generative Verification for LLM Reasoning [90.5036809670993]
Scaling test-time compute has emerged as a key strategy for enhancing the reasoning capabilities of large language models. Recent advancements in Generative Reward Models (GenRM) reframe verification as a next-token prediction task. We evaluate GenRM against Self-Consistency (SC) for most practical inference budgets across diverse models and datasets.
arXiv Detail & Related papers (2025-04-01T17:41:57Z)
Scalable Best-of-N Selection for Large Language Models via Self-Certainty [65.31658824274894]
Best-of-N selection is a key technique for improving the reasoning performance of Large Language Models. We propose self-certainty, a novel and efficient metric to estimate response quality without requiring external reward models. Our findings establish self-certainty as a practical and efficient way for improving LLM reasoning capabilities.
arXiv Detail & Related papers (2025-02-25T19:08:07Z)
Generative Verifiers: Reward Modeling as Next-Token Prediction [29.543787728397643]
Verifiers or reward models are often used to enhance the reasoning performance of large language models (LLMs) We propose training verifiers using the ubiquitous next-token prediction objective, jointly on verification and solution generation. We demonstrate that GenRM outperforms discriminative, DPO verifiers, and LLM-as-a-Judge.
arXiv Detail & Related papers (2024-08-27T17:57:45Z)
Comparing Hyper-optimized Machine Learning Models for Predicting Efficiency Degradation in Organic Solar Cells [39.847063110051245]
This work presents a set of optimal machine learning (ML) models to represent the temporal degradation suffered by the power conversion efficiency (PCE) of organic solar cells (OSCs) We generated a database with 996 entries, which includes up to 7 variables regarding both the manufacturing process and environmental conditions for more than 180 days. The accuracy achieved reaches values of the coefficient determination (R2) widely exceeding 0.90, whereas the root mean squared error (RMSE), sum of squared error (SSE), and mean absolute error (MAE)>1% of the target value, the PCE.
arXiv Detail & Related papers (2024-03-29T22:05:26Z)
Effective Neural Network $L_0$ Regularization With BinMask [15.639601066641099]
We show that a straightforward formulation, BinMask, is an effective $L_0$ regularizer. We evaluate BinMask on three tasks: feature selection, network sparsification, and model regularization.
arXiv Detail & Related papers (2023-04-21T20:08:57Z)
Active Fairness Auditing [22.301071549943064]
We study query-based auditing algorithms that can estimate the demographic parity of ML models in a query-efficient manner. We propose an optimal deterministic algorithm, as well as a practical randomized, oracle-efficient algorithm with comparable guarantees. Our first exploration of active fairness estimation aims to put AI governance on firmer theoretical foundations.
arXiv Detail & Related papers (2022-06-16T21:12:00Z)
Unpacking the Black Box: Regulating Algorithmic Decisions [1.283555556182245]
We propose a model of oversight over 'black-box' algorithms used in high-stakes applications such as lending, medical testing, or hiring. We show that allowing for complex algorithms can improve welfare, but the gains depend on how the regulator regulates them.
arXiv Detail & Related papers (2021-10-05T23:20:25Z)
Probabilistic Case-based Reasoning for Open-World Knowledge Graph Completion [59.549664231655726]
A case-based reasoning (CBR) system solves a new problem by retrieving cases' that are similar to the given problem. In this paper, we demonstrate that such a system is achievable for reasoning in knowledge-bases (KBs) Our approach predicts attributes for an entity by gathering reasoning paths from similar entities in the KB.
arXiv Detail & Related papers (2020-10-07T17:48:12Z)
Coded Stochastic ADMM for Decentralized Consensus Optimization with Edge Computing [113.52575069030192]
Big data, including applications with high security requirements, are often collected and stored on multiple heterogeneous devices, such as mobile devices, drones and vehicles. Due to the limitations of communication costs and security requirements, it is of paramount importance to extract information in a decentralized manner instead of aggregating data to a fusion center. We consider the problem of learning model parameters in a multi-agent system with data locally processed via distributed edge nodes. A class of mini-batch alternating direction method of multipliers (ADMM) algorithms is explored to develop the distributed learning model.
arXiv Detail & Related papers (2020-10-02T10:41:59Z)
Adaptive Sampling for Best Policy Identification in Markov Decision Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model. The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z)
Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal Sample Complexity [67.02490430380415]
We show that model-based MARL achieves a sample complexity of $tilde O(|S||B|(gamma)-3epsilon-2)$ for finding the Nash equilibrium (NE) value up to some $epsilon$ error. We also show that such a sample bound is minimax-optimal (up to logarithmic factors) if the algorithm is reward-agnostic, where the algorithm queries state transition samples without reward knowledge.
arXiv Detail & Related papers (2020-07-15T03:25:24Z)
Breaking the Sample Size Barrier in Model-Based Reinforcement Learning with a Generative Model [50.38446482252857]
This paper is concerned with the sample efficiency of reinforcement learning, assuming access to a generative model (or simulator) We first consider $gamma$-discounted infinite-horizon Markov decision processes (MDPs) with state space $mathcalS$ and action space $mathcalA$. We prove that a plain model-based planning algorithm suffices to achieve minimax-optimal sample complexity given any target accuracy level.
arXiv Detail & Related papers (2020-05-26T17:53:18Z)
ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators [108.3381301768299]
Masked language modeling (MLM) pre-training methods such as BERT corrupt the input by replacing some tokens with [MASK] and then train a model to reconstruct the original tokens. We propose a more sample-efficient pre-training task called replaced token detection.
arXiv Detail & Related papers (2020-03-23T21:17:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.