ContraCLM: Contrastive Learning For Causal Language Model
- URL: http://arxiv.org/abs/2210.01185v2
- Date: Tue, 2 May 2023 22:46:46 GMT
- Title: ContraCLM: Contrastive Learning For Causal Language Model
- Authors: Nihal Jain, Dejiao Zhang, Wasi Uddin Ahmad, Zijian Wang, Feng Nan,
Xiaopeng Li, Ming Tan, Ramesh Nallapati, Baishakhi Ray, Parminder Bhatia,
Xiaofei Ma, Bing Xiang
- Abstract summary: We present ContraCLM, a novel contrastive learning framework at both token-level and sequence-level.
We show that ContraCLM enhances discrimination of the representations and bridges the gap with the encoder-only models.
- Score: 54.828635613501376
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Despite exciting progress in causal language models, the expressiveness of
the representations is largely limited due to poor discrimination ability. To
remedy this issue, we present ContraCLM, a novel contrastive learning framework
at both token-level and sequence-level. We assess ContraCLM on a variety of
downstream tasks. We show that ContraCLM enhances discrimination of the
representations and bridges the gap with the encoder-only models, which makes
causal language models better suited for tasks beyond language generation.
Specifically, we attain $44\%$ relative improvement on the Semantic Textual
Similarity tasks and $34\%$ on Code-to-Code Search tasks. Furthermore, by
improving the expressiveness of the representations, ContraCLM also boosts the
source code generation capability with $9\%$ relative improvement on execution
accuracy on the HumanEval benchmark.
Related papers
- Unified Generative and Discriminative Training for Multi-modal Large Language Models [88.84491005030316]
Generative training has enabled Vision-Language Models (VLMs) to tackle various complex tasks.
Discriminative training, exemplified by models like CLIP, excels in zero-shot image-text classification and retrieval.
This paper proposes a unified approach that integrates the strengths of both paradigms.
arXiv Detail & Related papers (2024-11-01T01:51:31Z) - Linguistics Theory Meets LLM: Code-Switched Text Generation via Equivalence Constrained Large Language Models [16.82812708514889]
Code-switching, the phenomenon of alternating between two or more languages in a single conversation, presents unique challenges for Natural Language Processing (NLP)
Most existing research focuses on either syntactic constraints or neural generation, with few efforts to integrate linguistic theory with large language models (LLMs) for generating natural code-switched text.
We introduce EZSwitch, a novel framework that combines Equivalence Constraint Theory (ECT) with LLMs to produce linguistically valid and fluent code-switched text.
arXiv Detail & Related papers (2024-10-30T03:03:32Z) - Code Representation Learning At Scale [75.04686476303436]
We fuel code representation learning with a vast amount of code data via a two-stage pretraining scheme.
We first train the encoders via a mix that leverages both randomness in masking language modeling and the structure aspect of programming language.
We then enhance the representations via contrastive learning with hard negative and hard positive constructed in an unsupervised manner.
arXiv Detail & Related papers (2024-02-02T22:19:15Z) - Speculative Contrastive Decoding [55.378200871224074]
Large language models(LLMs) exhibit exceptional performance in language tasks, yet their auto-regressive inference is limited due to high computational requirements and is sub-optimal due to the exposure bias.
Inspired by speculative decoding and contrastive decoding, we introduce Speculative Contrastive Decoding(SCD), a straightforward yet powerful decoding approach.
arXiv Detail & Related papers (2023-11-15T14:15:30Z) - Contrastive Decoding Improves Reasoning in Large Language Models [55.16503283583076]
We show that Contrastive Decoding achieves large out-of-the-box improvements over greedy decoding on a variety of reasoning tasks.
We show that Contrastive Decoding leads LLaMA-65B to outperform LLaMA 2, GPT-3.5 and PaLM 2-L on the HellaSwag commonsense reasoning benchmark.
arXiv Detail & Related papers (2023-09-17T00:29:32Z) - RAVEN: In-Context Learning with Retrieval-Augmented Encoder-Decoder Language Models [57.12888828853409]
RAVEN is a model that combines retrieval-augmented masked language modeling and prefix language modeling.
Fusion-in-Context Learning enables the model to leverage more in-context examples without requiring additional training.
Our work underscores the potential of retrieval-augmented encoder-decoder language models for in-context learning.
arXiv Detail & Related papers (2023-08-15T17:59:18Z) - Emergent Linguistic Structures in Neural Networks are Fragile [20.692540987792732]
Large Language Models (LLMs) have been reported to have strong performance on natural language processing tasks.
We propose a framework to assess the consistency and robustness of linguistic representations.
arXiv Detail & Related papers (2022-10-31T15:43:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.