Wisdom of the Ensemble: Improving Consistency of Deep Learning Models
- URL: http://arxiv.org/abs/2011.06796v1
- Date: Fri, 13 Nov 2020 07:47:01 GMT
- Title: Wisdom of the Ensemble: Improving Consistency of Deep Learning Models
- Authors: Lijing Wang, Dipanjan Ghosh, Maria Teresa Gonzalez Diaz, Ahmed
Farahat, Mahbubul Alam, Chetan Gupta, Jiangzhuo Chen, Madhav Marathe
- Abstract summary: Trust is often a function of constant behavior.
This paper studies a model behavior in the context of periodic retraining of deployed models.
We prove that consistency and correct-consistency of an ensemble learner is not less than the average consistency and correct-consistency of individual learners.
- Score: 11.230300336108018
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep learning classifiers are assisting humans in making decisions and hence
the user's trust in these models is of paramount importance. Trust is often a
function of constant behavior. From an AI model perspective it means given the
same input the user would expect the same output, especially for correct
outputs, or in other words consistently correct outputs. This paper studies a
model behavior in the context of periodic retraining of deployed models where
the outputs from successive generations of the models might not agree on the
correct labels assigned to the same input. We formally define consistency and
correct-consistency of a learning model. We prove that consistency and
correct-consistency of an ensemble learner is not less than the average
consistency and correct-consistency of individual learners and
correct-consistency can be improved with a probability by combining learners
with accuracy not less than the average accuracy of ensemble component
learners. To validate the theory using three datasets and two state-of-the-art
deep learning classifiers we also propose an efficient dynamic snapshot
ensemble method and demonstrate its value.
Related papers
- Dynamic Post-Hoc Neural Ensemblers [55.15643209328513]
In this study, we explore employing neural networks as ensemble methods.
Motivated by the risk of learning low-diversity ensembles, we propose regularizing the model by randomly dropping base model predictions.
We demonstrate this approach lower bounds the diversity within the ensemble, reducing overfitting and improving generalization capabilities.
arXiv Detail & Related papers (2024-10-06T15:25:39Z) - Entity-level Factual Adaptiveness of Fine-tuning based Abstractive
Summarization Models [31.84120883461332]
We analyze the robustness of fine-tuning based summarization models to the knowledge conflict.
We introduce a controllable counterfactual data augmentation method.
arXiv Detail & Related papers (2024-02-23T07:53:39Z) - Selective Learning: Towards Robust Calibration with Dynamic Regularization [79.92633587914659]
Miscalibration in deep learning refers to there is a discrepancy between the predicted confidence and performance.
We introduce Dynamic Regularization (DReg) which aims to learn what should be learned during training thereby circumventing the confidence adjusting trade-off.
arXiv Detail & Related papers (2024-02-13T11:25:20Z) - Self-Consistency of Large Language Models under Ambiguity [4.141513298907867]
This work presents an evaluation benchmark for self-consistency in cases of under-specification.
We conduct a series of behavioral experiments on the OpenAI model suite using an ambiguous integer sequence completion task.
We find that average consistency ranges from 67% to 82%, far higher than would be predicted if a model's consistency was random.
arXiv Detail & Related papers (2023-10-20T11:57:56Z) - Preserving Knowledge Invariance: Rethinking Robustness Evaluation of
Open Information Extraction [50.62245481416744]
We present the first benchmark that simulates the evaluation of open information extraction models in the real world.
We design and annotate a large-scale testbed in which each example is a knowledge-invariant clique.
By further elaborating the robustness metric, a model is judged to be robust if its performance is consistently accurate on the overall cliques.
arXiv Detail & Related papers (2023-05-23T12:05:09Z) - Improving Adaptive Conformal Prediction Using Self-Supervised Learning [72.2614468437919]
We train an auxiliary model with a self-supervised pretext task on top of an existing predictive model and use the self-supervised error as an additional feature to estimate nonconformity scores.
We empirically demonstrate the benefit of the additional information using both synthetic and real data on the efficiency (width), deficit, and excess of conformal prediction intervals.
arXiv Detail & Related papers (2023-02-23T18:57:14Z) - Uncertainty Estimation for Language Reward Models [5.33024001730262]
Language models can learn a range of capabilities from unsupervised training on text corpora.
It is often easier for humans to choose between options than to provide labeled data, and prior work has achieved state-of-the-art performance by training a reward model from such preference comparisons.
We seek to address these problems via uncertainty estimation, which can improve sample efficiency and robustness using active learning and risk-averse reinforcement learning.
arXiv Detail & Related papers (2022-03-14T20:13:21Z) - Characterizing and overcoming the greedy nature of learning in
multi-modal deep neural networks [62.48782506095565]
We show that due to the greedy nature of learning in deep neural networks, models tend to rely on just one modality while under-fitting the other modalities.
We propose an algorithm to balance the conditional learning speeds between modalities during training and demonstrate that it indeed addresses the issue of greedy learning.
arXiv Detail & Related papers (2022-02-10T20:11:21Z) - Accurate, yet inconsistent? Consistency Analysis on Language
Understanding Models [38.03490197822934]
consistency refers to the capability of generating the same predictions for semantically similar contexts.
We propose a framework named consistency analysis on language understanding models (CALUM) to evaluate the model's lower-bound consistency ability.
arXiv Detail & Related papers (2021-08-15T06:25:07Z) - Distilling Interpretable Models into Human-Readable Code [71.11328360614479]
Human-readability is an important and desirable standard for machine-learned model interpretability.
We propose to train interpretable models using conventional methods, and then distill them into concise, human-readable code.
We describe a piecewise-linear curve-fitting algorithm that produces high-quality results efficiently and reliably across a broad range of use cases.
arXiv Detail & Related papers (2021-01-21T01:46:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.