When stakes are high: balancing accuracy and transparency with
Model-Agnostic Interpretable Data-driven suRRogates
- URL: http://arxiv.org/abs/2007.06894v2
- Date: Thu, 10 Dec 2020 17:44:03 GMT
- Title: When stakes are high: balancing accuracy and transparency with
Model-Agnostic Interpretable Data-driven suRRogates
- Authors: Roel Henckaerts and Katrien Antonio and Marie-Pier C\^ot\'e
- Abstract summary: Highly regulated industries, like banking and insurance, ask for transparent decision-making algorithms.
We present a procedure to develop a Model-Agnostic Interpretable Data-driven suRRogate (maidrr)
Knowledge is extracted from a black box via partial dependence effects.
This results in a segmentation of the feature space with automatic variable selection.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Highly regulated industries, like banking and insurance, ask for transparent
decision-making algorithms. At the same time, competitive markets are pushing
for the use of complex black box models. We therefore present a procedure to
develop a Model-Agnostic Interpretable Data-driven suRRogate (maidrr) suited
for structured tabular data. Knowledge is extracted from a black box via
partial dependence effects. These are used to perform smart feature engineering
by grouping variable values. This results in a segmentation of the feature
space with automatic variable selection. A transparent generalized linear model
(GLM) is fit to the features in categorical format and their relevant
interactions. We demonstrate our R package maidrr with a case study on general
insurance claim frequency modeling for six publicly available datasets. Our
maidrr GLM closely approximates a gradient boosting machine (GBM) black box and
outperforms both a linear and tree surrogate as benchmarks.
Related papers
- MM-RLHF: The Next Step Forward in Multimodal LLM Alignment [59.536850459059856]
We introduce MM-RLHF, a dataset containing $mathbf120k$ fine-grained, human-annotated preference comparison pairs.
We propose several key innovations to improve the quality of reward models and the efficiency of alignment algorithms.
Our approach is rigorously evaluated across $mathbf10$ distinct dimensions and $mathbf27$ benchmarks.
arXiv Detail & Related papers (2025-02-14T18:59:51Z) - ROSE: Revolutionizing Open-Set Dense Segmentation with Patch-Wise Perceptual Large Multimodal Model [75.750699619993]
We propose ROSE, a Revolutionary Open-set dense SEgmentation LMM, which enables dense mask prediction and open-category generation.
Our method treats each image patch as an independent region of interest candidate, enabling the model to predict both dense and sparse masks simultaneously.
arXiv Detail & Related papers (2024-11-29T07:00:18Z) - Identifying Light-curve Signals with a Deep Learning Based Object
Detection Algorithm. II. A General Light Curve Classification Framework [0.0]
We present a novel deep learning framework for classifying light curves using a weakly supervised object detection model.
Our framework identifies the optimal windows for both light curves and power spectra automatically, and zooms in on their corresponding data.
We train our model on datasets obtained from both space-based and ground-based multi-band observations of variable stars and transients.
arXiv Detail & Related papers (2023-11-14T11:08:34Z) - CELDA: Leveraging Black-box Language Model as Enhanced Classifier
without Labels [14.285609493077965]
Clustering-enhanced Linear Discriminative Analysis, a novel approach that improves the text classification accuracy with a very weak-supervision signal.
Our framework draws a precise decision boundary without accessing weights or gradients of the LM model or data labels.
arXiv Detail & Related papers (2023-06-05T08:35:31Z) - Disentanglement via Latent Quantization [60.37109712033694]
In this work, we construct an inductive bias towards encoding to and decoding from an organized latent space.
We demonstrate the broad applicability of this approach by adding it to both basic data-re (vanilla autoencoder) and latent-reconstructing (InfoGAN) generative models.
arXiv Detail & Related papers (2023-05-28T06:30:29Z) - GMMSeg: Gaussian Mixture based Generative Semantic Segmentation Models [74.0430727476634]
We propose a new family of segmentation models that rely on a dense generative classifier for the joint distribution p(pixel feature,class)
With a variety of segmentation architectures and backbones, GMMSeg outperforms the discriminative counterparts on closed-set datasets.
GMMSeg even performs well on open-world datasets.
arXiv Detail & Related papers (2022-10-05T05:20:49Z) - Interpreting Black-box Machine Learning Models for High Dimensional
Datasets [40.09157165704895]
We train a black-box model on a high-dimensional dataset to learn the embeddings on which the classification is performed.
We then approximate the behavior of the black-box model by means of an interpretable surrogate model on the top-k feature space.
Our approach outperforms state-of-the-art methods like TabNet and XGboost when tested on different datasets.
arXiv Detail & Related papers (2022-08-29T07:36:17Z) - Self-service Data Classification Using Interactive Visualization and
Interpretable Machine Learning [9.13755431537592]
Iterative Visual Logical (IVLC) is an interpretable machine learning algorithm.
IVLC is especially helpful when dealing with sensitive and crucial data like cancer data in the medical domain.
This chapter proposes an automated classification approach combined with new Coordinate Order (COO) algorithm and genetic algorithm.
arXiv Detail & Related papers (2021-07-11T05:39:14Z) - Cauchy-Schwarz Regularized Autoencoder [68.80569889599434]
Variational autoencoders (VAE) are a powerful and widely-used class of generative models.
We introduce a new constrained objective based on the Cauchy-Schwarz divergence, which can be computed analytically for GMMs.
Our objective improves upon variational auto-encoding models in density estimation, unsupervised clustering, semi-supervised learning, and face analysis.
arXiv Detail & Related papers (2021-01-06T17:36:26Z) - Interpretabilit\'e des mod\`eles : \'etat des lieux des m\'ethodes et
application \`a l'assurance [1.6058099298620423]
Data is the raw material of many models today make it possible to increase the quality and performance of digital services.
Models users must ensure that models do not discriminate against and that it is also possible to explain its result.
The widening of the panel of predictive algorithms leads scientists to be vigilant about the use of models.
arXiv Detail & Related papers (2020-07-25T12:18:07Z) - Semi-Supervised Learning with Normalizing Flows [54.376602201489995]
FlowGMM is an end-to-end approach to generative semi supervised learning with normalizing flows.
We show promising results on a wide range of applications, including AG-News and Yahoo Answers text data.
arXiv Detail & Related papers (2019-12-30T17:36:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.