XtracTree: a Simple and Effective Method for Regulator Validation of
Bagging Methods Used in Retail Banking
- URL: http://arxiv.org/abs/2004.02326v3
- Date: Tue, 17 Aug 2021 14:31:33 GMT
- Title: XtracTree: a Simple and Effective Method for Regulator Validation of
Bagging Methods Used in Retail Banking
- Authors: Jeremy Charlier and Vladimir Makarenkov
- Abstract summary: We propose XtracTree, an algorithm capable of efficiently converting an ML bagging classifier, such as a random forest, into simple "if-then" rules.
Our experiments demonstrate that using XtracTree, one can convert an ML model into a rule-based algorithm.
The proposed approach allowed our banking institution to reduce up to 50% the time of delivery of our AI solutions to the end-user.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Bootstrap aggregation, known as bagging, is one of the most popular ensemble
methods used in machine learning (ML). An ensemble method is a ML method that
combines multiple hypotheses to form a single hypothesis used for prediction. A
bagging algorithm combines multiple classifiers modeled on different
sub-samples of the same data set to build one large classifier. Banks, and
their retail banking activities, are nowadays using the power of ML algorithms,
including decision trees and random forests, to optimize their processes.
However, banks have to comply with regulators and governance and, hence,
delivering effective ML solutions is a challenging task. It starts with the
bank's validation and governance department, followed by the deployment of the
solution in a production environment up to the external validation of the
national financial regulator. Each proposed ML model has to be validated and
clear rules for every algorithm-based decision must be justified. In this
context, we propose XtracTree, an algorithm capable of efficiently converting
an ML bagging classifier, such as a random forest, into simple "if-then" rules
satisfying the requirements of model validation. We use a public loan data set
from Kaggle to illustrate the usefulness of our approach. Our experiments
demonstrate that using XtracTree, one can convert an ML model into a rule-based
algorithm, leading to easier model validation by national financial regulators
and the bank's validation department. The proposed approach allowed our banking
institution to reduce up to 50% the time of delivery of our AI solutions to the
end-user.
Related papers
- Generative Verifiers: Reward Modeling as Next-Token Prediction [29.543787728397643]
Verifiers or reward models are often used to enhance the reasoning performance of large language models (LLMs)
We propose training verifiers using the ubiquitous next-token prediction objective, jointly on verification and solution generation.
We demonstrate that GenRM outperforms discriminative, DPO verifiers, and LLM-as-a-Judge.
arXiv Detail & Related papers (2024-08-27T17:57:45Z) - Comparing Hyper-optimized Machine Learning Models for Predicting Efficiency Degradation in Organic Solar Cells [39.847063110051245]
This work presents a set of optimal machine learning (ML) models to represent the temporal degradation suffered by the power conversion efficiency (PCE) of organic solar cells (OSCs)
We generated a database with 996 entries, which includes up to 7 variables regarding both the manufacturing process and environmental conditions for more than 180 days.
The accuracy achieved reaches values of the coefficient determination (R2) widely exceeding 0.90, whereas the root mean squared error (RMSE), sum of squared error (SSE), and mean absolute error (MAE)>1% of the target value, the PCE.
arXiv Detail & Related papers (2024-03-29T22:05:26Z) - Effective Neural Network $L_0$ Regularization With BinMask [15.639601066641099]
We show that a straightforward formulation, BinMask, is an effective $L_0$ regularizer.
We evaluate BinMask on three tasks: feature selection, network sparsification, and model regularization.
arXiv Detail & Related papers (2023-04-21T20:08:57Z) - Active Fairness Auditing [22.301071549943064]
We study query-based auditing algorithms that can estimate the demographic parity of ML models in a query-efficient manner.
We propose an optimal deterministic algorithm, as well as a practical randomized, oracle-efficient algorithm with comparable guarantees.
Our first exploration of active fairness estimation aims to put AI governance on firmer theoretical foundations.
arXiv Detail & Related papers (2022-06-16T21:12:00Z) - Unpacking the Black Box: Regulating Algorithmic Decisions [1.283555556182245]
We propose a model of oversight over 'black-box' algorithms used in high-stakes applications such as lending, medical testing, or hiring.
We show that allowing for complex algorithms can improve welfare, but the gains depend on how the regulator regulates them.
arXiv Detail & Related papers (2021-10-05T23:20:25Z) - Probabilistic Case-based Reasoning for Open-World Knowledge Graph
Completion [59.549664231655726]
A case-based reasoning (CBR) system solves a new problem by retrieving cases' that are similar to the given problem.
In this paper, we demonstrate that such a system is achievable for reasoning in knowledge-bases (KBs)
Our approach predicts attributes for an entity by gathering reasoning paths from similar entities in the KB.
arXiv Detail & Related papers (2020-10-07T17:48:12Z) - Coded Stochastic ADMM for Decentralized Consensus Optimization with Edge
Computing [113.52575069030192]
Big data, including applications with high security requirements, are often collected and stored on multiple heterogeneous devices, such as mobile devices, drones and vehicles.
Due to the limitations of communication costs and security requirements, it is of paramount importance to extract information in a decentralized manner instead of aggregating data to a fusion center.
We consider the problem of learning model parameters in a multi-agent system with data locally processed via distributed edge nodes.
A class of mini-batch alternating direction method of multipliers (ADMM) algorithms is explored to develop the distributed learning model.
arXiv Detail & Related papers (2020-10-02T10:41:59Z) - Adaptive Sampling for Best Policy Identification in Markov Decision
Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model.
The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z) - Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal
Sample Complexity [67.02490430380415]
We show that model-based MARL achieves a sample complexity of $tilde O(|S||B|(gamma)-3epsilon-2)$ for finding the Nash equilibrium (NE) value up to some $epsilon$ error.
We also show that such a sample bound is minimax-optimal (up to logarithmic factors) if the algorithm is reward-agnostic, where the algorithm queries state transition samples without reward knowledge.
arXiv Detail & Related papers (2020-07-15T03:25:24Z) - Breaking the Sample Size Barrier in Model-Based Reinforcement Learning
with a Generative Model [50.38446482252857]
This paper is concerned with the sample efficiency of reinforcement learning, assuming access to a generative model (or simulator)
We first consider $gamma$-discounted infinite-horizon Markov decision processes (MDPs) with state space $mathcalS$ and action space $mathcalA$.
We prove that a plain model-based planning algorithm suffices to achieve minimax-optimal sample complexity given any target accuracy level.
arXiv Detail & Related papers (2020-05-26T17:53:18Z) - ELECTRA: Pre-training Text Encoders as Discriminators Rather Than
Generators [108.3381301768299]
Masked language modeling (MLM) pre-training methods such as BERT corrupt the input by replacing some tokens with [MASK] and then train a model to reconstruct the original tokens.
We propose a more sample-efficient pre-training task called replaced token detection.
arXiv Detail & Related papers (2020-03-23T21:17:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.