The Benefits of Bad Advice: Autocontrastive Decoding across Model Layers
- URL: http://arxiv.org/abs/2305.01628v1
- Date: Tue, 2 May 2023 17:42:37 GMT
- Title: The Benefits of Bad Advice: Autocontrastive Decoding across Model Layers
- Authors: Ariel Gera, Roni Friedman, Ofir Arviv, Chulaka Gunasekara, Benjamin
Sznajder, Noam Slonim, Eyal Shnarch
- Abstract summary: We argue that due to the gradual improvement across model layers, additional information can be gleaned from the contrast between higher and lower layers during inference.
We propose a novel approach that utilizes the contrast between layers to improve text generation outputs.
- Score: 14.596485032985328
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Applying language models to natural language processing tasks typically
relies on the representations in the final model layer, as intermediate hidden
layer representations are presumed to be less informative. In this work, we
argue that due to the gradual improvement across model layers, additional
information can be gleaned from the contrast between higher and lower layers
during inference. Specifically, in choosing between the probable next token
predictions of a generative model, the predictions of lower layers can be used
to highlight which candidates are best avoided. We propose a novel approach
that utilizes the contrast between layers to improve text generation outputs,
and show that it mitigates degenerative behaviors of the model in open-ended
generation, significantly improving the quality of generated texts.
Furthermore, our results indicate that contrasting between model layers at
inference time can yield substantial benefits to certain aspects of general
language model capabilities, more effectively extracting knowledge during
inference from a given set of model parameters.
Related papers
- Entropy Guided Extrapolative Decoding to Improve Factuality in Large Language Models [55.45444773200529]
Large language models (LLMs) exhibit impressive natural language capabilities but suffer from hallucination.
Recent work has focused on decoding techniques to improve factuality during inference.
arXiv Detail & Related papers (2024-04-14T19:45:35Z) - Topic Modeling as Multi-Objective Contrastive Optimization [46.24876966674759]
Recent representation learning approaches enhance neural topic models by optimizing the weighted linear combination of the evidence lower bound (ELBO) of the log-likelihood and the contrastive learning objective that contrasts pairs of input documents.
We introduce a novel contrastive learning method oriented towards sets of topic vectors to capture useful semantics that are shared among a set of input documents.
Our framework consistently produces higher-performing neural topic models in terms of topic coherence, topic diversity, and downstream performance.
arXiv Detail & Related papers (2024-02-12T11:18:32Z) - PLANNER: Generating Diversified Paragraph via Latent Language Diffusion Model [37.2192243883707]
We propose PLANNER, a model that combines latent semantic diffusion with autoregressive generation to generate fluent text.
Results on semantic generation, text completion and summarization show its effectiveness in generating high-quality long-form text.
arXiv Detail & Related papers (2023-06-05T01:36:39Z) - Explaining Language Models' Predictions with High-Impact Concepts [11.47612457613113]
We propose a complete framework for extending concept-based interpretability methods to NLP.
We optimize for features whose existence causes the output predictions to change substantially.
Our method achieves superior results on predictive impact, usability, and faithfulness compared to the baselines.
arXiv Detail & Related papers (2023-05-03T14:48:27Z) - A Multi-dimensional Evaluation of Tokenizer-free Multilingual Pretrained
Models [87.7086269902562]
We show that subword-based models might still be the most practical choice in many settings.
We encourage future work in tokenizer-free methods to consider these factors when designing and evaluating new models.
arXiv Detail & Related papers (2022-10-13T15:47:09Z) - Few-shot Text Classification with Dual Contrastive Consistency [31.141350717029358]
In this paper, we explore how to utilize pre-trained language model to perform few-shot text classification.
We adopt supervised contrastive learning on few labeled data and consistency-regularization on vast unlabeled data.
arXiv Detail & Related papers (2022-09-29T19:26:23Z) - An Empirical Investigation of Commonsense Self-Supervision with
Knowledge Graphs [67.23285413610243]
Self-supervision based on the information extracted from large knowledge graphs has been shown to improve the generalization of language models.
We study the effect of knowledge sampling strategies and sizes that can be used to generate synthetic data for adapting language models.
arXiv Detail & Related papers (2022-05-21T19:49:04Z) - Generative Counterfactuals for Neural Networks via Attribute-Informed
Perturbation [51.29486247405601]
We design a framework to generate counterfactuals for raw data instances with the proposed Attribute-Informed Perturbation (AIP)
By utilizing generative models conditioned with different attributes, counterfactuals with desired labels can be obtained effectively and efficiently.
Experimental results on real-world texts and images demonstrate the effectiveness, sample quality as well as efficiency of our designed framework.
arXiv Detail & Related papers (2021-01-18T08:37:13Z) - Improving the Reconstruction of Disentangled Representation Learners via Multi-Stage Modeling [54.94763543386523]
Current autoencoder-based disentangled representation learning methods achieve disentanglement by penalizing the ( aggregate) posterior to encourage statistical independence of the latent factors.
We present a novel multi-stage modeling approach where the disentangled factors are first learned using a penalty-based disentangled representation learning method.
Then, the low-quality reconstruction is improved with another deep generative model that is trained to model the missing correlated latent variables.
arXiv Detail & Related papers (2020-10-25T18:51:15Z) - Explaining and Improving Model Behavior with k Nearest Neighbor
Representations [107.24850861390196]
We propose using k nearest neighbor representations to identify training examples responsible for a model's predictions.
We show that kNN representations are effective at uncovering learned spurious associations.
Our results indicate that the kNN approach makes the finetuned model more robust to adversarial inputs.
arXiv Detail & Related papers (2020-10-18T16:55:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.