Anti-LM Decoding for Zero-shot In-context Machine Translation
- URL: http://arxiv.org/abs/2311.08324v2
- Date: Tue, 2 Apr 2024 19:03:15 GMT
- Title: Anti-LM Decoding for Zero-shot In-context Machine Translation
- Authors: Suzanna Sia, Alexandra DeLucia, Kevin Duh,
- Abstract summary: This work introduces an Anti-Language Model objective with a decay factor designed to address the weaknesses of In-context Machine Translation.
We conduct experiments across 3 model types and sizes, 3 language directions, and for both greedy decoding and beam search.
- Score: 59.26037416204157
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Zero-shot In-context learning is the phenomenon where models can perform the task simply given the instructions. However, pre-trained large language models are known to be poorly calibrated for this task. One of the most effective approaches to handling this bias is to adopt a contrastive decoding objective, which accounts for the prior probability of generating the next token by conditioning on some context. This work introduces an Anti-Language Model objective with a decay factor designed to address the weaknesses of In-context Machine Translation. We conduct our experiments across 3 model types and sizes, 3 language directions, and for both greedy decoding and beam search ($B=5$). The proposed method outperforms other state-of-art decoding objectives, with up to $20$ BLEU point improvement from the default objective observed in some settings.
Related papers
- Adversarial Contrastive Decoding: Boosting Safety Alignment of Large Language Models via Opposite Prompt Optimization [34.29833630422768]
Adversarial Contrastive Decoding (ACD) is an optimization-based framework to generate two opposite system prompts for prompt-based contrastive decoding.
ACD achieves much better safety performance than previous model training-free decoding methods without sacrificing original generation ability.
arXiv Detail & Related papers (2024-06-24T15:51:30Z) - Code Representation Learning At Scale [75.04686476303436]
We fuel code representation learning with a vast amount of code data via a two-stage pretraining scheme.
We first train the encoders via a mix that leverages both randomness in masking language modeling and the structure aspect of programming language.
We then enhance the representations via contrastive learning with hard negative and hard positive constructed in an unsupervised manner.
arXiv Detail & Related papers (2024-02-02T22:19:15Z) - Are We Falling in a Middle-Intelligence Trap? An Analysis and Mitigation
of the Reversal Curse [73.65112477688353]
Recent studies have highlighted a phenomenon in large language models known as "the reversal curse"
We contend that the reversal curse is partially a result of specific model training objectives.
We propose a novel training method, BI Casual language modeling Optimization (BICO), designed to mitigate the reversal curse.
arXiv Detail & Related papers (2023-11-13T17:01:12Z) - Language Model Pre-Training with Sparse Latent Typing [66.75786739499604]
We propose a new pre-training objective, Sparse Latent Typing, which enables the model to sparsely extract sentence-level keywords with diverse latent types.
Experimental results show that our model is able to learn interpretable latent type categories in a self-supervised manner without using any external knowledge.
arXiv Detail & Related papers (2022-10-23T00:37:08Z) - What Language Model Architecture and Pretraining Objective Work Best for
Zero-Shot Generalization? [50.84738303888189]
We present a large-scale evaluation of modeling choices and their impact on zero-shot generalization.
We train models with over 5 billion parameters for more than 170 billion tokens.
We find that pretrained causal decoder models can be efficiently adapted into non-causal decoder models.
arXiv Detail & Related papers (2022-04-12T14:19:49Z) - A Generative Language Model for Few-shot Aspect-Based Sentiment Analysis [90.24921443175514]
We focus on aspect-based sentiment analysis, which involves extracting aspect term, category, and predicting their corresponding polarities.
We propose to reformulate the extraction and prediction tasks into the sequence generation task, using a generative language model with unidirectional attention.
Our approach outperforms the previous state-of-the-art (based on BERT) on average performance by a large margins in few-shot and full-shot settings.
arXiv Detail & Related papers (2022-04-11T18:31:53Z) - Rethinking Zero-shot Neural Machine Translation: From a Perspective of
Latent Variables [28.101782382170306]
We introduce a denoising autoencoder objective based on pivot language into traditional training objective to improve the translation accuracy on zero-shot directions.
We demonstrate that the proposed method is able to effectively eliminate the spurious correlations and significantly outperforms state-of-the-art methods with a remarkable performance.
arXiv Detail & Related papers (2021-09-10T07:18:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.