Multilingual Contrastive Decoding via Language-Agnostic Layers Skipping
- URL: http://arxiv.org/abs/2407.10795v1
- Date: Mon, 15 Jul 2024 15:14:01 GMT
- Title: Multilingual Contrastive Decoding via Language-Agnostic Layers Skipping
- Authors: Wenhao Zhu, Sizhe Liu, Shujian Huang, Shuaijie She, Chris Wendler, Jiajun Chen,
- Abstract summary: Decoding by contrasting layers (DoLa) is designed to improve the generation quality of large language models.
We find that this approach does not work well on non-English tasks.
Inspired by previous interpretability work on language transition during the model's forward pass, we propose an improved contrastive decoding algorithm.
- Score: 60.458273797431836
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Decoding by contrasting layers (DoLa), is designed to improve the generation quality of large language models (LLMs) by contrasting the prediction probabilities between an early exit output (amateur logits) and the final output (expert logits). However, we find that this approach does not work well on non-English tasks. Inspired by previous interpretability work on language transition during the model's forward pass, we discover that this issue arises from a language mismatch between early exit output and final output. In this work, we propose an improved contrastive decoding algorithm that is effective for diverse languages beyond English. To obtain more helpful amateur logits, we devise two strategies to skip a set of bottom, language-agnostic layers based on our preliminary analysis. Experimental results on multilingual reasoning benchmarks demonstrate that our proposed method outperforms previous contrastive decoding baselines and substantially improves LLM's chain-of-thought reasoning accuracy across 11 languages. The project will be available at: https://github.com/NJUNLP/SkipLayerCD.
Related papers
- Breaking the Script Barrier in Multilingual Pre-Trained Language Models with Transliteration-Based Post-Training Alignment [50.27950279695363]
The transfer performance is often hindered when a low-resource target language is written in a different script than the high-resource source language.
Inspired by recent work that uses transliteration to address this problem, our paper proposes a transliteration-based post-pretraining alignment (PPA) method.
arXiv Detail & Related papers (2024-06-28T08:59:24Z) - Towards Translating Real-World Code with LLMs: A Study of Translating to Rust [13.743967357458287]
Large language models (LLMs) show promise in code translation due to their ability to write code in most programming languages.
We conduct our study on code extracted from real-world open source projects.
FLOURINE is an end-to-end code translation tool that uses differential fuzzing to check if a Rust translation is I/O equivalent to the original source program.
arXiv Detail & Related papers (2024-05-19T10:54:03Z) - Chimera: A Lossless Decoding Method for Accelerating Large Language Models Inference by Fusing all Tokens [15.566726645722657]
We propose a novel framework specifically designed for speculative sampling.
Within this framework, we introduce a lightweight draft model that effectively utilizes previously generated tokens to predict subsequent words.
We demonstrate impressive results, achieving an average latency speedup ratio of 2.7x compared to the vanilla auto-regressive decoding approach.
arXiv Detail & Related papers (2024-02-24T08:10:39Z) - Analyzing and Reducing the Performance Gap in Cross-Lingual Transfer
with Fine-tuning Slow and Fast [50.19681990847589]
Existing research has shown that a multilingual pre-trained language model fine-tuned with one (source) language also performs well on downstream tasks for non-source languages.
This paper analyzes the fine-tuning process, discovers when the performance gap changes and identifies which network weights affect the overall performance most.
arXiv Detail & Related papers (2023-05-19T06:04:21Z) - GanLM: Encoder-Decoder Pre-training with an Auxiliary Discriminator [114.8954615026781]
We propose a GAN-style model for encoder-decoder pre-training by introducing an auxiliary discriminator.
GanLM is trained with two pre-training objectives: replaced token detection and replaced token denoising.
Experiments in language generation benchmarks show that GanLM with the powerful language understanding capability outperforms various strong pre-trained language models.
arXiv Detail & Related papers (2022-12-20T12:51:11Z) - From Good to Best: Two-Stage Training for Cross-lingual Machine Reading
Comprehension [51.953428342923885]
We develop a two-stage approach to enhance the model performance.
The first stage targets at recall: we design a hard-learning (HL) algorithm to maximize the likelihood that the top-k predictions contain the accurate answer.
The second stage focuses on precision: an answer-aware contrastive learning mechanism is developed to learn the fine difference between the accurate answer and other candidates.
arXiv Detail & Related papers (2021-12-09T07:31:15Z) - Zero-Shot Cross-lingual Semantic Parsing [56.95036511882921]
We study cross-lingual semantic parsing as a zero-shot problem without parallel data for 7 test languages.
We propose a multi-task encoder-decoder model to transfer parsing knowledge to additional languages using only English-Logical form paired data.
Our system frames zero-shot parsing as a latent-space alignment problem and finds that pre-trained models can be improved to generate logical forms with minimal cross-lingual transfer penalty.
arXiv Detail & Related papers (2021-04-15T16:08:43Z) - An Empirical Study of End-to-end Simultaneous Speech Translation
Decoding Strategies [17.78024523121448]
This paper proposes a decoding strategy for end-to-end simultaneous speech translation.
We leverage end-to-end models trained in offline mode and conduct an empirical study for two language pairs.
arXiv Detail & Related papers (2021-03-04T18:55:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.