Obtaining Explainable Classification Models using Distributionally
Robust Optimization
- URL: http://arxiv.org/abs/2311.01994v1
- Date: Fri, 3 Nov 2023 15:45:34 GMT
- Title: Obtaining Explainable Classification Models using Distributionally
Robust Optimization
- Authors: Sanjeeb Dash, Soumyadip Ghosh, Joao Goncalves, Mark S. Squillante
- Abstract summary: We study generalized linear models constructed using sets of feature value rules.
An inherent trade-off exists between rule set sparsity and its prediction accuracy.
We propose a new formulation to learn an ensemble of rule sets that simultaneously addresses these competing factors.
- Score: 12.511155426574563
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Model explainability is crucial for human users to be able to interpret how a
proposed classifier assigns labels to data based on its feature values. We
study generalized linear models constructed using sets of feature value rules,
which can capture nonlinear dependencies and interactions. An inherent
trade-off exists between rule set sparsity and its prediction accuracy. It is
computationally expensive to find the right choice of sparsity -- e.g., via
cross-validation -- with existing methods. We propose a new formulation to
learn an ensemble of rule sets that simultaneously addresses these competing
factors. Good generalization is ensured while keeping computational costs low
by utilizing distributionally robust optimization. The formulation utilizes
column generation to efficiently search the space of rule sets and constructs a
sparse ensemble of rule sets, in contrast with techniques like random forests
or boosting and their variants. We present theoretical results that motivate
and justify the use of our distributionally robust formulation. Extensive
numerical experiments establish that our method improves over competing methods
-- on a large set of publicly available binary classification problem instances
-- with respect to one or more of the following metrics: generalization
quality, computational cost, and explainability.
Related papers
- Optimal Baseline Corrections for Off-Policy Contextual Bandits [61.740094604552475]
We aim to learn decision policies that optimize an unbiased offline estimate of an online reward metric.
We propose a single framework built on their equivalence in learning scenarios.
Our framework enables us to characterize the variance-optimal unbiased estimator and provide a closed-form solution for it.
arXiv Detail & Related papers (2024-05-09T12:52:22Z) - Likelihood Ratio Confidence Sets for Sequential Decision Making [51.66638486226482]
We revisit the likelihood-based inference principle and propose to use likelihood ratios to construct valid confidence sequences.
Our method is especially suitable for problems with well-specified likelihoods.
We show how to provably choose the best sequence of estimators and shed light on connections to online convex optimization.
arXiv Detail & Related papers (2023-11-08T00:10:21Z) - HyperImpute: Generalized Iterative Imputation with Automatic Model
Selection [77.86861638371926]
We propose a generalized iterative imputation framework for adaptively and automatically configuring column-wise models.
We provide a concrete implementation with out-of-the-box learners, simulators, and interfaces.
arXiv Detail & Related papers (2022-06-15T19:10:35Z) - Probability-driven scoring functions in combining linear classifiers [0.913755431537592]
This research is aimed at building a new fusion method dedicated to the ensemble of linear classifiers.
The proposed fusion method is compared with the reference method using multiple benchmark datasets taken from the KEEL repository.
The experimental study shows that, under certain conditions, some improvement may be obtained.
arXiv Detail & Related papers (2021-09-16T08:58:32Z) - A Normative Model of Classifier Fusion [4.111899441919164]
We present a hierarchical Bayesian model of probabilistic classification fusion based on a new correlated Dirichlet distribution.
The proposed model naturally accommodates the classic Independent Opinion Pool and other independent fusion algorithms as special cases.
It is evaluated by uncertainty reduction and correctness of fusion on synthetic and real-world data sets.
arXiv Detail & Related papers (2021-06-03T11:52:13Z) - Rule Generation for Classification: Scalability, Interpretability, and Fairness [0.0]
We propose a new rule-based optimization method for classification with constraints.
We address interpretability and fairness by assigning cost coefficients to the rules and introducing additional constraints.
The proposed method exhibits a good compromise between local interpretability and fairness on the one side, and accuracy on the other side.
arXiv Detail & Related papers (2021-04-21T20:31:28Z) - Squared $\ell_2$ Norm as Consistency Loss for Leveraging Augmented Data
to Learn Robust and Invariant Representations [76.85274970052762]
Regularizing distance between embeddings/representations of original samples and augmented counterparts is a popular technique for improving robustness of neural networks.
In this paper, we explore these various regularization choices, seeking to provide a general understanding of how we should regularize the embeddings.
We show that the generic approach we identified (squared $ell$ regularized augmentation) outperforms several recent methods, which are each specially designed for one task.
arXiv Detail & Related papers (2020-11-25T22:40:09Z) - Density Fixing: Simple yet Effective Regularization Method based on the
Class Prior [2.3859169601259347]
We propose a framework of regularization methods, called density-fixing, that can be used commonly for supervised and semi-supervised learning.
Our proposed regularization method improves the generalization performance by forcing the model to approximate the class's prior distribution or the frequency of occurrence.
arXiv Detail & Related papers (2020-07-08T04:58:22Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z) - Adaptive Correlated Monte Carlo for Contextual Categorical Sequence
Generation [77.7420231319632]
We adapt contextual generation of categorical sequences to a policy gradient estimator, which evaluates a set of correlated Monte Carlo (MC) rollouts for variance control.
We also demonstrate the use of correlated MC rollouts for binary-tree softmax models, which reduce the high generation cost in large vocabulary scenarios.
arXiv Detail & Related papers (2019-12-31T03:01:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.