Related papers: Chat Vector: A Simple Approach to Equip LLMs with Instruction Following and Model Alignment in New Languages

Chat Vector: A Simple Approach to Equip LLMs with Instruction Following and Model Alignment in New Languages

URL: http://arxiv.org/abs/2310.04799v3
Date: Fri, 7 Jun 2024 06:28:05 GMT
Title: Chat Vector: A Simple Approach to Equip LLMs with Instruction Following and Model Alignment in New Languages
Authors: Shih-Cheng Huang, Pin-Zu Li, Yu-Chi Hsu, Kuang-Ming Chen, Yu Tung Lin, Shih-Kai Hsiao, Richard Tzong-Han Tsai, Hung-yi Lee,
Abstract summary: We introduce the concept of $textitchat vector$ to equip pre-trained language models with instruction following and human value alignment. By simply adding the chat vector to a continual pre-trained model's weights, we can endow the model with chat capabilities without the need for languages.
Score: 40.37822682459469
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recently, the development of open-source large language models (LLMs) has advanced rapidly. Nevertheless, due to data constraints, the capabilities of most open-source LLMs are primarily focused on English. To address this issue, we introduce the concept of $\textit{chat vector}$ to equip pre-trained language models with instruction following and human value alignment via simple model arithmetic. The chat vector is derived by subtracting the weights of a pre-trained base model (e.g. LLaMA2) from those of its corresponding chat model (e.g. LLaMA2-chat). By simply adding the chat vector to a continual pre-trained model's weights, we can endow the model with chat capabilities in new languages without the need for further training. Our empirical studies demonstrate the superior efficacy of the chat vector from three different aspects: instruction following, toxicity mitigation, and multi-turn dialogue. Moreover, to showcase the adaptability of our approach, we extend our experiments to encompass various languages, base models, and chat vectors. The results underscore the chat vector's simplicity, effectiveness, and wide applicability, making it a compelling solution for efficiently enabling conversational capabilities in pre-trained language models. Our code is available at https://github.com/aqweteddy/ChatVector.

Related papers

The Unreasonable Effectiveness of Model Merging for Cross-Lingual Transfer in LLMs [54.59207567677249]
Large language models (LLMs) still struggle across tasks outside of high-resource languages.<n>In this work, we investigate cross-lingual transfer to lower-resource languages where task-specific post-training data is scarce.
arXiv Detail & Related papers (2025-05-23T20:28:31Z)
LLM should think and action as a human [0.0]
In the multi-turns conversation, for each user prompt, the large language model thinks based on elements such as chat history, thinking context, action calls, memory and knowledge. Our experimental results show that the reasoning ability and planning ability of the large language model are enhanced, and the issues in the multi-turns conversation are solved.
arXiv Detail & Related papers (2025-02-19T06:58:34Z)
ElChat: Adapting Chat Language Models Using Only Target Unlabeled Language Data [38.341705137026985]
We propose ElChat, a new language adaptation method for chat LLMs. It adapts a chat model directly on target unlabeled data, without a base model. It elicits chat abilities by injecting information from the source chat model.
arXiv Detail & Related papers (2024-12-16T12:26:28Z)
Towards a More Inclusive AI: Progress and Perspectives in Large Language Model Training for the Sámi Language [7.289015788793582]
This work focuses on increasing technological participation for the S'ami language. We draw the attention of the ML community towards the language modeling problem of Ultra Low Resource (ULR) languages. We have compiled the available S'ami language resources from the web to create a clean dataset for training language models.
arXiv Detail & Related papers (2024-05-09T13:54:22Z)
Lemur: Harmonizing Natural Language and Code for Language Agents [105.43564788499901]
We introduce Lemur and Lemur-Chat, open-source language models optimized for both natural language and coding capabilities. Our models achieve state-of-the-art averaged performance across diverse text and coding benchmarks. The harmonization between natural and programming languages enables Lemur-Chat to significantly narrow the gap with proprietary models on agent abilities.
arXiv Detail & Related papers (2023-10-10T17:57:45Z)
Qwen Technical Report [132.54304067403922]
We introduce Qwen, the first installment of our large language model series. Qwen is the base pretrained language models, and Qwen-Chat, the chat models finetuned with human alignment techniques. We have also developed coding-specialized models, Code-Qwen and Code-Qwen-Chat, as well as mathematics-focused models, Math-Qwen-Chat.
arXiv Detail & Related papers (2023-09-28T17:07:49Z)
Improving Language Plasticity via Pretraining with Active Forgetting [63.36484652568976]
We propose to use an active forgetting mechanism during pretraining, as a simple way of creating PLMs that can quickly adapt to new languages. Experiments with RoBERTa show that models pretrained with our forgetting mechanism demonstrate faster convergence during language adaptation.
arXiv Detail & Related papers (2023-07-03T17:12:44Z)
Efficient Language Model Training through Cross-Lingual and Progressive Transfer Learning [0.7612676127275795]
Most Transformer language models are pretrained on English text. As model sizes grow, the performance gap between English and other languages increases even further. We introduce a cross-lingual and progressive transfer learning approach, called CLP-Transfer.
arXiv Detail & Related papers (2023-01-23T18:56:12Z)
Language Model Pre-Training with Sparse Latent Typing [66.75786739499604]
We propose a new pre-training objective, Sparse Latent Typing, which enables the model to sparsely extract sentence-level keywords with diverse latent types. Experimental results show that our model is able to learn interpretable latent type categories in a self-supervised manner without using any external knowledge.
arXiv Detail & Related papers (2022-10-23T00:37:08Z)
What do Large Language Models Learn beyond Language? [10.9650651784511]
We find that pretrained models significantly outperform comparable non-pretrained neural models. Experiments surprisingly reveal that the positive effects of pre-training persist even when pretraining on multi-lingual text or computer code. Our findings suggest a hitherto unexplored deep connection between pre-training and inductive learning abilities of language models.
arXiv Detail & Related papers (2022-10-21T23:43:13Z)
Scheduled Multi-task Learning for Neural Chat Translation [66.81525961469494]
We propose a scheduled multi-task learning framework for Neural Chat Translation (NCT) Specifically, we devise a three-stage training framework to incorporate the large-scale in-domain chat translation data into training. Extensive experiments in four language directions verify the effectiveness and superiority of the proposed approach.
arXiv Detail & Related papers (2022-05-08T02:57:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.