Learning Interpretable Models Using Uncertainty Oracles
- URL: http://arxiv.org/abs/1906.06852v6
- Date: Thu, 22 Aug 2024 22:21:38 GMT
- Title: Learning Interpretable Models Using Uncertainty Oracles
- Authors: Abhishek Ghose, Balaraman Ravindran,
- Abstract summary: A desirable property of interpretable models is small size, so that they are easily understandable by humans.
This leads to the following challenges: (a) small sizes imply diminished accuracy, and (b) bespoke levers provided by model families to restrict size might be insufficient to reach the desired size-accuracy trade-off.
- Score: 12.879371384378164
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A desirable property of interpretable models is small size, so that they are easily understandable by humans. This leads to the following challenges: (a) small sizes typically imply diminished accuracy, and (b) bespoke levers provided by model families to restrict size, e.g., L1 regularization, might be insufficient to reach the desired size-accuracy trade-off. We address these challenges here. Earlier work has shown that learning the training distribution creates accurate small models. Our contribution is a new technique that exploits this idea. The training distribution is encoded as a Dirichlet Process to allow for a flexible number of modes that is learnable from the data. Its parameters are learned using Bayesian Optimization; a design choice that makes the technique applicable to non-differentiable loss functions. To avoid the challenges with high dimensionality, the data is first projected down to one-dimension using uncertainty scores of a separate probabilistic model, that we refer to as the uncertainty oracle. We show that this technique addresses the above challenges: (a) it arrests the reduction in accuracy that comes from shrinking a model (in some cases we observe $\sim 100\%$ improvement over baselines), and also, (b) that this maybe applied with no change across model families with different notions of size; results are shown for Decision Trees, Linear Probability models and Gradient Boosted Models. Additionally, we show that (1) it is more accurate than its predecessor, (2) requires only one hyperparameter to be set in practice, (3) accommodates a multi-variate notion of model size, e.g., both maximum depth of a tree and number of trees in Gradient Boosted Models, and (4) works across different feature spaces between the uncertainty oracle and the interpretable model, e.g., a GRU might act as an oracle for a decision tree that ingests n-grams.
Related papers
- Mean estimation in the add-remove model of differential privacy [20.78625240235862]
We study the problem of one-dimensional mean estimation under the add-remove model.
We show that our proposed algorithm yields at least a factor of two improvement in mean squared error over algorithms frequently used in practice.
arXiv Detail & Related papers (2023-12-11T18:59:35Z) - Online Speculative Decoding [34.987825705622555]
We introduce online speculative decoding to accelerate the inference of large language models.
The main idea is to continuously update the (multiple) draft model(s) on observed user query data.
We develop a prototype of online speculative decoding based on knowledge distillation and evaluate it using both synthetic and real query data.
arXiv Detail & Related papers (2023-10-11T04:03:42Z) - Interpretability at Scale: Identifying Causal Mechanisms in Alpaca [62.65877150123775]
We use Boundless DAS to efficiently search for interpretable causal structure in large language models while they follow instructions.
Our findings mark a first step toward faithfully understanding the inner-workings of our ever-growing and most widely deployed language models.
arXiv Detail & Related papers (2023-05-15T17:15:40Z) - Specializing Smaller Language Models towards Multi-Step Reasoning [56.78474185485288]
We show that abilities can be distilled down from GPT-3.5 ($ge$ 175B) to T5 variants ($le$ 11B)
We propose model specialization, to specialize the model's ability towards a target task.
arXiv Detail & Related papers (2023-01-30T08:51:19Z) - Training Trajectories of Language Models Across Scales [99.38721327771208]
Scaling up language models has led to unprecedented performance gains.
How do language models of different sizes learn during pre-training?
Why do larger language models demonstrate more desirable behaviors?
arXiv Detail & Related papers (2022-12-19T19:16:29Z) - An Approximation Method for Fitted Random Forests [0.0]
We study methods that approximate each fitted tree in the Random Forests model using the multinomial allocation of the data points to the leafs.
Specifically, we begin by studying whether fitting a multinomial logistic regression helps reduce the size while preserving the prediction quality.
arXiv Detail & Related papers (2022-07-05T17:28:52Z) - Optimizing Binary Decision Diagrams with MaxSAT for classification [3.2894524838755608]
A growing interest in explainable artificial intelligence motivates the need for interpretable machine learning (ML) models.
Recently, several exact methods for computing such models are proposed to overcome weaknesses of traditional methods.
In this paper, we first propose SAT-based models for learning optimal Binary decision diagrams (BDDs)
Then, we lift the encoding to a MaxSAT model to learn optimal BDDs in limited depths.
Finally, we tackle the fragmentation problem by introducing a method to merge compatible subtrees for the BDDs found via the MaxSAT model.
arXiv Detail & Related papers (2022-03-21T23:17:37Z) - Probabilistic Modeling for Human Mesh Recovery [73.11532990173441]
This paper focuses on the problem of 3D human reconstruction from 2D evidence.
We recast the problem as learning a mapping from the input to a distribution of plausible 3D poses.
arXiv Detail & Related papers (2021-08-26T17:55:11Z) - Sufficiently Accurate Model Learning for Planning [119.80502738709937]
This paper introduces the constrained Sufficiently Accurate model learning approach.
It provides examples of such problems, and presents a theorem on how close some approximate solutions can be.
The approximate solution quality will depend on the function parameterization, loss and constraint function smoothness, and the number of samples in model learning.
arXiv Detail & Related papers (2021-02-11T16:27:31Z) - When Ensembling Smaller Models is More Efficient than Single Large
Models [52.38997176317532]
We show that ensembles can outperform single models with both higher accuracy and requiring fewer total FLOPs to compute.
This presents an interesting observation that output diversity in ensembling can often be more efficient than training larger models.
arXiv Detail & Related papers (2020-05-01T18:56:18Z) - Interventions and Counterfactuals in Tractable Probabilistic Models:
Limitations of Contemporary Transformations [12.47276164048813]
We show that when transforming SPNs to a causal graph interventional reasoning reduces to computing marginal distributions.
We first provide an algorithm for constructing a causal graph from a PSDD, which introduces augmented variables.
arXiv Detail & Related papers (2020-01-29T15:45:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.