Contextual Temperature for Language Modeling
- URL: http://arxiv.org/abs/2012.13575v1
- Date: Fri, 25 Dec 2020 13:50:03 GMT
- Title: Contextual Temperature for Language Modeling
- Authors: Pei-Hsin Wang, Sheng-Iou Hsieh, Shih-Chieh Chang, Yu-Ting Chen, Jia-Yu
Pan, Wei Wei, Da-Chang Juan
- Abstract summary: We propose contextual temperature, which learns an optimal temperature trajectory for each vocabulary over the context.
Experimental results confirm that the proposed method significantly improves state-of-the-art language models.
In-depth analyses show that the behaviour of the learned temperature schedules varies dramatically by vocabulary.
- Score: 14.485125883455975
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Temperature scaling has been widely used as an effective approach to control
the smoothness of a distribution, which helps the model performance in various
tasks. Current practices to apply temperature scaling assume either a fixed, or
a manually-crafted dynamically changing schedule. However, our studies indicate
that the individual optimal trajectory for each class can change with the
context. To this end, we propose contextual temperature, a generalized approach
that learns an optimal temperature trajectory for each vocabulary over the
context. Experimental results confirm that the proposed method significantly
improves state-of-the-art language models, achieving a perplexity of 55.31 and
62.89 on the test set of Penn Treebank and WikiText-2, respectively. In-depth
analyses show that the behaviour of the learned temperature schedules varies
dramatically by vocabulary, and that the optimal schedules help in controlling
the uncertainties. These evidences further justify the need for the proposed
method and its advantages over fixed temperature schedules.
Related papers
- Adaptive Decoding via Latent Preference Optimization [55.70602730588745]
We introduce Adaptive Decoding, a layer added to the model to select the sampling temperature dynamically at inference time.
Our method outperforms all fixed decoding temperatures across a range of tasks that require different temperatures.
arXiv Detail & Related papers (2024-11-14T18:31:39Z) - Temperature Balancing, Layer-wise Weight Analysis, and Neural Network
Training [58.20089993899729]
This paper proposes TempBalance, a straightforward yet effective layerwise learning rate method.
We show that TempBalance significantly outperforms ordinary SGD and carefully-tuned spectral norm regularization.
We also show that TempBalance outperforms a number of state-of-the-art metrics and schedulers.
arXiv Detail & Related papers (2023-12-01T05:38:17Z) - Emerging Statistical Machine Learning Techniques for Extreme Temperature
Forecasting in U.S. Cities [0.0]
We present a comprehensive analysis of extreme temperature patterns using emerging statistical machine learning techniques.
We apply these methods to climate time series data from five most populated U.S. cities.
Our findings highlight the differences between the statistical methods and identify Multilayer Perceptrons as the most effective approach.
arXiv Detail & Related papers (2023-07-26T16:38:32Z) - Long Horizon Temperature Scaling [90.03310732189543]
Long Horizon Temperature Scaling (LHTS) is a novel approach for sampling from temperature-scaled joint distributions.
We derive a temperature-dependent LHTS objective, and show that finetuning a model on a range of temperatures produces a single model capable of generation with a controllable long horizon temperature parameter.
arXiv Detail & Related papers (2023-02-07T18:59:32Z) - FUN with Fisher: Improving Generalization of Adapter-Based Cross-lingual Transfer with Scheduled Unfreezing [60.629222280633606]
We investigate scheduled unfreezing algorithms for fine-tuning task adapters.
Experiments show scheduled unfreezing methods close the gap to full fine-tuning and achieve stronger cross-lingual transfer performance.
We propose a general scheduled unfreezing algorithm that achieves an average of 2 points improvement over four datasets.
arXiv Detail & Related papers (2023-01-13T11:26:53Z) - Fine-tune your Classifier: Finding Correlations With Temperature [2.071516130824992]
We analyze the impact of temperature on classification tasks by describing a dataset as a set of statistics computed on representations.
We study the correlation between these extracted statistics and the observed optimal temperatures.
arXiv Detail & Related papers (2022-10-18T09:48:46Z) - Extracting or Guessing? Improving Faithfulness of Event Temporal
Relation Extraction [87.04153383938969]
We improve the faithfulness of TempRel extraction models from two perspectives.
The first perspective is to extract genuinely based on contextual description.
The second perspective is to provide proper uncertainty estimation.
arXiv Detail & Related papers (2022-10-10T19:53:13Z) - Adaptive Temperature Scaling for Robust Calibration of Deep Neural
Networks [0.7219077740523682]
We focus on the task of confidence scaling, specifically on post-hoc methods that generalize Temperature Scaling.
We show that when there is plenty of data complex models like neural networks yield better performance, but are prone to fail when the amount of data is limited.
We propose Entropy-based Temperature Scaling, a simple method that scales the confidence of a prediction according to its entropy.
arXiv Detail & Related papers (2022-07-31T16:20:06Z) - Selecting Informative Contexts Improves Language Model Finetuning [66.26521454263343]
We present a general fine-tuning method that we call information gain filtration.
During fine-tuning, a secondary learner selects informative examples and skips uninformative ones.
We show that our method has consistent improvement across datasets, fine-tuning tasks, and language model architectures.
arXiv Detail & Related papers (2020-05-01T02:01:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.