ContraCLM: Contrastive Learning For Causal Language Model
- URL: http://arxiv.org/abs/2210.01185v2
- Date: Tue, 2 May 2023 22:46:46 GMT
- Title: ContraCLM: Contrastive Learning For Causal Language Model
- Authors: Nihal Jain, Dejiao Zhang, Wasi Uddin Ahmad, Zijian Wang, Feng Nan,
Xiaopeng Li, Ming Tan, Ramesh Nallapati, Baishakhi Ray, Parminder Bhatia,
Xiaofei Ma, Bing Xiang
- Abstract summary: We present ContraCLM, a novel contrastive learning framework at both token-level and sequence-level.
We show that ContraCLM enhances discrimination of the representations and bridges the gap with the encoder-only models.
- Score: 54.828635613501376
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Despite exciting progress in causal language models, the expressiveness of
the representations is largely limited due to poor discrimination ability. To
remedy this issue, we present ContraCLM, a novel contrastive learning framework
at both token-level and sequence-level. We assess ContraCLM on a variety of
downstream tasks. We show that ContraCLM enhances discrimination of the
representations and bridges the gap with the encoder-only models, which makes
causal language models better suited for tasks beyond language generation.
Specifically, we attain $44\%$ relative improvement on the Semantic Textual
Similarity tasks and $34\%$ on Code-to-Code Search tasks. Furthermore, by
improving the expressiveness of the representations, ContraCLM also boosts the
source code generation capability with $9\%$ relative improvement on execution
accuracy on the HumanEval benchmark.
Related papers
- Code Representation Learning At Scale [75.04686476303436]
We fuel code representation learning with a vast amount of code data via a two-stage pretraining scheme.
We first train the encoders via a mix that leverages both randomness in masking language modeling and the structure aspect of programming language.
We then enhance the representations via contrastive learning with hard negative and hard positive constructed in an unsupervised manner.
arXiv Detail & Related papers (2024-02-02T22:19:15Z) - Transfer Attacks and Defenses for Large Language Models on Coding Tasks [30.065641782962974]
We study the effect of adversarial perturbations on coding tasks with large language models (LLMs)
We propose prompt-based defenses that involve modifying the prompt to include examples of adversarially perturbed code and explicit instructions for reversing adversarial perturbations.
Our experiments show that adversarial examples obtained with a smaller code model are indeed transferable, weakening the LLMs' performance.
arXiv Detail & Related papers (2023-11-22T15:11:35Z) - Speculative Contrastive Decoding [55.378200871224074]
Large language models(LLMs) exhibit exceptional performance in language tasks, yet their auto-regressive inference is limited due to high computational requirements and is sub-optimal due to the exposure bias.
Inspired by speculative decoding and contrastive decoding, we introduce Speculative Contrastive Decoding(SCD), a straightforward yet powerful decoding approach.
arXiv Detail & Related papers (2023-11-15T14:15:30Z) - CoAnnotating: Uncertainty-Guided Work Allocation between Human and Large
Language Models for Data Annotation [94.59630161324013]
We propose CoAnnotating, a novel paradigm for Human-LLM co-annotation of unstructured texts at scale.
Our empirical study shows CoAnnotating to be an effective means to allocate work from results on different datasets, with up to 21% performance improvement over random baseline.
arXiv Detail & Related papers (2023-10-24T08:56:49Z) - Contrastive Decoding Improves Reasoning in Large Language Models [55.16503283583076]
We show that Contrastive Decoding achieves large out-of-the-box improvements over greedy decoding on a variety of reasoning tasks.
We show that Contrastive Decoding leads LLaMA-65B to outperform LLaMA 2, GPT-3.5 and PaLM 2-L on the HellaSwag commonsense reasoning benchmark.
arXiv Detail & Related papers (2023-09-17T00:29:32Z) - RAVEN: In-Context Learning with Retrieval-Augmented Encoder-Decoder Language Models [57.12888828853409]
RAVEN is a model that combines retrieval-augmented masked language modeling and prefix language modeling.
Fusion-in-Context Learning enables the model to leverage more in-context examples without requiring additional training.
Our work underscores the potential of retrieval-augmented encoder-decoder language models for in-context learning.
arXiv Detail & Related papers (2023-08-15T17:59:18Z) - Emergent Linguistic Structures in Neural Networks are Fragile [20.692540987792732]
Large Language Models (LLMs) have been reported to have strong performance on natural language processing tasks.
We propose a framework to assess the consistency and robustness of linguistic representations.
arXiv Detail & Related papers (2022-10-31T15:43:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.