Bridging the Gap between Decision and Logits in Decision-based Knowledge
Distillation for Pre-trained Language Models
- URL: http://arxiv.org/abs/2306.08909v1
- Date: Thu, 15 Jun 2023 07:23:44 GMT
- Title: Bridging the Gap between Decision and Logits in Decision-based Knowledge
Distillation for Pre-trained Language Models
- Authors: Qinhong Zhou, Zonghan Yang, Peng Li, Yang Liu
- Abstract summary: We propose a novel method to estimate logits from the decision distributions.
Our method significantly outperforms strong baselines on both natural language understanding and machine reading comprehension datasets.
- Score: 16.115386424278213
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Conventional knowledge distillation (KD) methods require access to the
internal information of teachers, e.g., logits. However, such information may
not always be accessible for large pre-trained language models (PLMs). In this
work, we focus on decision-based KD for PLMs, where only teacher decisions
(i.e., top-1 labels) are accessible. Considering the information gap between
logits and decisions, we propose a novel method to estimate logits from the
decision distributions. Specifically, decision distributions can be both
derived as a function of logits theoretically and estimated with test-time data
augmentation empirically. By combining the theoretical and empirical
estimations of the decision distributions together, the estimation of logits
can be successfully reduced to a simple root-finding problem. Extensive
experiments show that our method significantly outperforms strong baselines on
both natural language understanding and machine reading comprehension datasets.
Related papers
- Enhancing Language Model Rationality with Bi-Directional Deliberation Reasoning [73.77288647011295]
This paper introduces BI-Directional DEliberation Reasoning (BIDDER) to enhance the decision rationality of language models.
Our approach involves three key processes:.
Inferring hidden states to represent uncertain information in the decision-making process from historical data;.
Using hidden states to predict future potential states and potential outcomes;.
Integrating historical information (past contexts) and long-term outcomes (future contexts) to inform reasoning.
arXiv Detail & Related papers (2024-07-08T16:48:48Z) - Neural Probabilistic Logic Learning for Knowledge Graph Reasoning [10.473897846826956]
This paper aims to design a reasoning framework that achieves accurate reasoning on knowledge graphs.
We introduce a scoring module that effectively enhances the expressive power of embedding networks.
We improve the interpretability of the model by incorporating a Markov Logic Network based on variational inference.
arXiv Detail & Related papers (2024-07-04T07:45:46Z) - Evaluating Human Alignment and Model Faithfulness of LLM Rationale [66.75309523854476]
We study how well large language models (LLMs) explain their generations through rationales.
We show that prompting-based methods are less "faithful" than attribution-based explanations.
arXiv Detail & Related papers (2024-06-28T20:06:30Z) - STEERING: Stein Information Directed Exploration for Model-Based
Reinforcement Learning [111.75423966239092]
We propose an exploration incentive in terms of the integral probability metric (IPM) between a current estimate of the transition model and the unknown optimal.
Based on KSD, we develop a novel algorithm algo: textbfSTEin information dirtextbfEcted exploration for model-based textbfReinforcement LearntextbfING.
arXiv Detail & Related papers (2023-01-28T00:49:28Z) - Explainable Data-Driven Optimization: From Context to Decision and Back
Again [76.84947521482631]
Data-driven optimization uses contextual information and machine learning algorithms to find solutions to decision problems with uncertain parameters.
We introduce a counterfactual explanation methodology tailored to explain solutions to data-driven problems.
We demonstrate our approach by explaining key problems in operations management such as inventory management and routing.
arXiv Detail & Related papers (2023-01-24T15:25:16Z) - Knowledge-driven Active Learning [70.37119719069499]
Active learning strategies aim at minimizing the amount of labelled data required to train a Deep Learning model.
Most active strategies are based on uncertain sample selection, and even often restricted to samples lying close to the decision boundary.
Here we propose to take into consideration common domain-knowledge and enable non-expert users to train a model with fewer samples.
arXiv Detail & Related papers (2021-10-15T06:11:53Z) - Learning from Matured Dumb Teacher for Fine Generalization [0.6079137591620588]
We show that random, untrained, and equally structured teacher networks can vastly improve generalization performance.
We propose matured dumb teacher based KD, conservatively transferring the hypothesis for generalization of the student without massive destruction of trained information.
arXiv Detail & Related papers (2021-08-12T14:37:36Z) - Robust Generalization despite Distribution Shift via Minimum
Discriminating Information [46.164498176119665]
We introduce a modeling framework where, in addition to training data, we have partial structural knowledge of the shifted test distribution.
We employ the principle of minimum discriminating information to embed the available prior knowledge.
We obtain explicit generalization bounds with respect to the unknown shifted distribution.
arXiv Detail & Related papers (2021-06-08T15:25:35Z) - Leveraging Expert Consistency to Improve Algorithmic Decision Support [62.61153549123407]
We explore the use of historical expert decisions as a rich source of information that can be combined with observed outcomes to narrow the construct gap.
We propose an influence function-based methodology to estimate expert consistency indirectly when each case in the data is assessed by a single expert.
Our empirical evaluation, using simulations in a clinical setting and real-world data from the child welfare domain, indicates that the proposed approach successfully narrows the construct gap.
arXiv Detail & Related papers (2021-01-24T05:40:29Z) - How fair can we go in machine learning? Assessing the boundaries of
fairness in decision trees [0.12891210250935145]
We present the first methodology that allows to explore the statistical limits of bias mitigation interventions.
We focus our study on decision tree classifiers since they are widely accepted in machine learning.
We conclude experimentally that our method can optimize decision tree models by being fairer with a small cost of the classification error.
arXiv Detail & Related papers (2020-06-22T16:28:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.