Chat Vector: A Simple Approach to Equip LLMs with Instruction Following and Model Alignment in New Languages
- URL: http://arxiv.org/abs/2310.04799v3
- Date: Fri, 7 Jun 2024 06:28:05 GMT
- Title: Chat Vector: A Simple Approach to Equip LLMs with Instruction Following and Model Alignment in New Languages
- Authors: Shih-Cheng Huang, Pin-Zu Li, Yu-Chi Hsu, Kuang-Ming Chen, Yu Tung Lin, Shih-Kai Hsiao, Richard Tzong-Han Tsai, Hung-yi Lee,
- Abstract summary: We introduce the concept of $textitchat vector$ to equip pre-trained language models with instruction following and human value alignment.
By simply adding the chat vector to a continual pre-trained model's weights, we can endow the model with chat capabilities without the need for languages.
- Score: 40.37822682459469
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, the development of open-source large language models (LLMs) has advanced rapidly. Nevertheless, due to data constraints, the capabilities of most open-source LLMs are primarily focused on English. To address this issue, we introduce the concept of $\textit{chat vector}$ to equip pre-trained language models with instruction following and human value alignment via simple model arithmetic. The chat vector is derived by subtracting the weights of a pre-trained base model (e.g. LLaMA2) from those of its corresponding chat model (e.g. LLaMA2-chat). By simply adding the chat vector to a continual pre-trained model's weights, we can endow the model with chat capabilities in new languages without the need for further training. Our empirical studies demonstrate the superior efficacy of the chat vector from three different aspects: instruction following, toxicity mitigation, and multi-turn dialogue. Moreover, to showcase the adaptability of our approach, we extend our experiments to encompass various languages, base models, and chat vectors. The results underscore the chat vector's simplicity, effectiveness, and wide applicability, making it a compelling solution for efficiently enabling conversational capabilities in pre-trained language models. Our code is available at https://github.com/aqweteddy/ChatVector.
Related papers
- LLM should think and action as a human [0.0]
In the multi-turns conversation, for each user prompt, the large language model thinks based on elements such as chat history, thinking context, action calls, memory and knowledge.
Our experimental results show that the reasoning ability and planning ability of the large language model are enhanced, and the issues in the multi-turns conversation are solved.
arXiv Detail & Related papers (2025-02-19T06:58:34Z) - Vocabulary Expansion of Chat Models with Unlabeled Target Language Data [38.341705137026985]
Chat models (i.e. language models trained to follow instructions through conversation with humans) outperform base models (i.e. trained solely on unlabeled data) in both conversation and general task-solving abilities.
These models are generally English-centric and require further adaptation for languages that are underrepresented in or absent from their training data.
We propose post-hoc techniques that inject information from the source model without requiring any further training. Experiments reveal the effectiveness of our methods, helping the adapted models to achieve performance improvements in 87% of cases.
arXiv Detail & Related papers (2024-12-16T12:26:28Z) - Towards a More Inclusive AI: Progress and Perspectives in Large Language Model Training for the Sámi Language [7.289015788793582]
This work focuses on increasing technological participation for the S'ami language.
We draw the attention of the ML community towards the language modeling problem of Ultra Low Resource (ULR) languages.
We have compiled the available S'ami language resources from the web to create a clean dataset for training language models.
arXiv Detail & Related papers (2024-05-09T13:54:22Z) - Lemur: Harmonizing Natural Language and Code for Language Agents [105.43564788499901]
We introduce Lemur and Lemur-Chat, open-source language models optimized for both natural language and coding capabilities.
Our models achieve state-of-the-art averaged performance across diverse text and coding benchmarks.
The harmonization between natural and programming languages enables Lemur-Chat to significantly narrow the gap with proprietary models on agent abilities.
arXiv Detail & Related papers (2023-10-10T17:57:45Z) - Qwen Technical Report [132.54304067403922]
We introduce Qwen, the first installment of our large language model series.
Qwen is the base pretrained language models, and Qwen-Chat, the chat models finetuned with human alignment techniques.
We have also developed coding-specialized models, Code-Qwen and Code-Qwen-Chat, as well as mathematics-focused models, Math-Qwen-Chat.
arXiv Detail & Related papers (2023-09-28T17:07:49Z) - Improving Language Plasticity via Pretraining with Active Forgetting [63.36484652568976]
We propose to use an active forgetting mechanism during pretraining, as a simple way of creating PLMs that can quickly adapt to new languages.
Experiments with RoBERTa show that models pretrained with our forgetting mechanism demonstrate faster convergence during language adaptation.
arXiv Detail & Related papers (2023-07-03T17:12:44Z) - Language Model Pre-Training with Sparse Latent Typing [66.75786739499604]
We propose a new pre-training objective, Sparse Latent Typing, which enables the model to sparsely extract sentence-level keywords with diverse latent types.
Experimental results show that our model is able to learn interpretable latent type categories in a self-supervised manner without using any external knowledge.
arXiv Detail & Related papers (2022-10-23T00:37:08Z) - Scheduled Multi-task Learning for Neural Chat Translation [66.81525961469494]
We propose a scheduled multi-task learning framework for Neural Chat Translation (NCT)
Specifically, we devise a three-stage training framework to incorporate the large-scale in-domain chat translation data into training.
Extensive experiments in four language directions verify the effectiveness and superiority of the proposed approach.
arXiv Detail & Related papers (2022-05-08T02:57:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.