When stakes are high: balancing accuracy and transparency with
Model-Agnostic Interpretable Data-driven suRRogates
- URL: http://arxiv.org/abs/2007.06894v2
- Date: Thu, 10 Dec 2020 17:44:03 GMT
- Title: When stakes are high: balancing accuracy and transparency with
Model-Agnostic Interpretable Data-driven suRRogates
- Authors: Roel Henckaerts and Katrien Antonio and Marie-Pier C\^ot\'e
- Abstract summary: Highly regulated industries, like banking and insurance, ask for transparent decision-making algorithms.
We present a procedure to develop a Model-Agnostic Interpretable Data-driven suRRogate (maidrr)
Knowledge is extracted from a black box via partial dependence effects.
This results in a segmentation of the feature space with automatic variable selection.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Highly regulated industries, like banking and insurance, ask for transparent
decision-making algorithms. At the same time, competitive markets are pushing
for the use of complex black box models. We therefore present a procedure to
develop a Model-Agnostic Interpretable Data-driven suRRogate (maidrr) suited
for structured tabular data. Knowledge is extracted from a black box via
partial dependence effects. These are used to perform smart feature engineering
by grouping variable values. This results in a segmentation of the feature
space with automatic variable selection. A transparent generalized linear model
(GLM) is fit to the features in categorical format and their relevant
interactions. We demonstrate our R package maidrr with a case study on general
insurance claim frequency modeling for six publicly available datasets. Our
maidrr GLM closely approximates a gradient boosting machine (GBM) black box and
outperforms both a linear and tree surrogate as benchmarks.
Related papers
- SKADA-Bench: Benchmarking Unsupervised Domain Adaptation Methods with Realistic Validation [55.87169702896249]
Unsupervised Domain Adaptation (DA) consists of adapting a model trained on a labeled source domain to perform well on an unlabeled target domain with some data distribution shift.
We propose a framework to evaluate DA methods and present a fair evaluation of existing shallow algorithms, including reweighting, mapping, and subspace alignment.
Our benchmark highlights the importance of realistic validation and provides practical guidance for real-life applications.
arXiv Detail & Related papers (2024-07-16T12:52:29Z) - Identifying Light-curve Signals with a Deep Learning Based Object
Detection Algorithm. II. A General Light Curve Classification Framework [0.0]
We present a novel deep learning framework for classifying light curves using a weakly supervised object detection model.
Our framework identifies the optimal windows for both light curves and power spectra automatically, and zooms in on their corresponding data.
We train our model on datasets obtained from both space-based and ground-based multi-band observations of variable stars and transients.
arXiv Detail & Related papers (2023-11-14T11:08:34Z) - CELDA: Leveraging Black-box Language Model as Enhanced Classifier
without Labels [14.285609493077965]
Clustering-enhanced Linear Discriminative Analysis, a novel approach that improves the text classification accuracy with a very weak-supervision signal.
Our framework draws a precise decision boundary without accessing weights or gradients of the LM model or data labels.
arXiv Detail & Related papers (2023-06-05T08:35:31Z) - Disentanglement via Latent Quantization [60.37109712033694]
In this work, we construct an inductive bias towards encoding to and decoding from an organized latent space.
We demonstrate the broad applicability of this approach by adding it to both basic data-re (vanilla autoencoder) and latent-reconstructing (InfoGAN) generative models.
arXiv Detail & Related papers (2023-05-28T06:30:29Z) - GMMSeg: Gaussian Mixture based Generative Semantic Segmentation Models [74.0430727476634]
We propose a new family of segmentation models that rely on a dense generative classifier for the joint distribution p(pixel feature,class)
With a variety of segmentation architectures and backbones, GMMSeg outperforms the discriminative counterparts on closed-set datasets.
GMMSeg even performs well on open-world datasets.
arXiv Detail & Related papers (2022-10-05T05:20:49Z) - Interpreting Black-box Machine Learning Models for High Dimensional
Datasets [40.09157165704895]
We train a black-box model on a high-dimensional dataset to learn the embeddings on which the classification is performed.
We then approximate the behavior of the black-box model by means of an interpretable surrogate model on the top-k feature space.
Our approach outperforms state-of-the-art methods like TabNet and XGboost when tested on different datasets.
arXiv Detail & Related papers (2022-08-29T07:36:17Z) - Self-service Data Classification Using Interactive Visualization and
Interpretable Machine Learning [9.13755431537592]
Iterative Visual Logical (IVLC) is an interpretable machine learning algorithm.
IVLC is especially helpful when dealing with sensitive and crucial data like cancer data in the medical domain.
This chapter proposes an automated classification approach combined with new Coordinate Order (COO) algorithm and genetic algorithm.
arXiv Detail & Related papers (2021-07-11T05:39:14Z) - Diverse Complexity Measures for Dataset Curation in Self-driving [80.55417232642124]
We propose a new data selection method that exploits a diverse set of criteria that quantize interestingness of traffic scenes.
Our experiments show that the proposed curation pipeline is able to select datasets that lead to better generalization and higher performance.
arXiv Detail & Related papers (2021-01-16T23:45:02Z) - Cauchy-Schwarz Regularized Autoencoder [68.80569889599434]
Variational autoencoders (VAE) are a powerful and widely-used class of generative models.
We introduce a new constrained objective based on the Cauchy-Schwarz divergence, which can be computed analytically for GMMs.
Our objective improves upon variational auto-encoding models in density estimation, unsupervised clustering, semi-supervised learning, and face analysis.
arXiv Detail & Related papers (2021-01-06T17:36:26Z) - Interpretabilit\'e des mod\`eles : \'etat des lieux des m\'ethodes et
application \`a l'assurance [1.6058099298620423]
Data is the raw material of many models today make it possible to increase the quality and performance of digital services.
Models users must ensure that models do not discriminate against and that it is also possible to explain its result.
The widening of the panel of predictive algorithms leads scientists to be vigilant about the use of models.
arXiv Detail & Related papers (2020-07-25T12:18:07Z) - Semi-Supervised Learning with Normalizing Flows [54.376602201489995]
FlowGMM is an end-to-end approach to generative semi supervised learning with normalizing flows.
We show promising results on a wide range of applications, including AG-News and Yahoo Answers text data.
arXiv Detail & Related papers (2019-12-30T17:36:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.