Mastering Text, Code and Math Simultaneously via Fusing Highly Specialized Language Models
- URL: http://arxiv.org/abs/2403.08281v4
- Date: Tue, 26 Mar 2024 09:29:51 GMT
- Title: Mastering Text, Code and Math Simultaneously via Fusing Highly Specialized Language Models
- Authors: Ning Ding, Yulin Chen, Ganqu Cui, Xingtai Lv, Weilin Zhao, Ruobing Xie, Bowen Zhou, Zhiyuan Liu, Maosong Sun,
- Abstract summary: Large language models (LLMs) strive to achieve high performance across all three domains simultaneously.
In this paper, we propose to fuse models that are already highly-specialized directly.
The proposed fusing framework, UltraFuser, consists of three distinct specialists that are already sufficiently trained on language, coding, and mathematics.
- Score: 93.92762966380793
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Underlying data distributions of natural language, programming code, and mathematical symbols vary vastly, presenting a complex challenge for large language models (LLMs) that strive to achieve high performance across all three domains simultaneously. Achieving a very high level of proficiency for an LLM within a specific domain often requires extensive training with relevant corpora, which is typically accompanied by a sacrifice in performance in other domains. In this paper, we propose to fuse models that are already highly-specialized directly. The proposed fusing framework, UltraFuser, consists of three distinct specialists that are already sufficiently trained on language, coding, and mathematics. A token-level gating mechanism is introduced to blend the specialists' outputs. A two-stage training strategy accompanied by balanced sampling is designed to ensure stability. To effectively train the fused model, we further construct a high-quality supervised instruction tuning dataset, UltraChat 2, which includes text, code, and mathematical content. This dataset comprises approximately 300,000 instructions and covers a wide range of topics in each domain. Experiments show that our model could simultaneously achieve mastery of the three crucial domains.
Related papers
- Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts [54.529880848937104]
We develop a unified MLLM with the MoE architecture, named Uni-MoE, that can handle a wide array of modalities.
Specifically, it features modality-specific encoders with connectors for a unified multimodal representation.
We evaluate the instruction-tuned Uni-MoE on a comprehensive set of multimodal datasets.
arXiv Detail & Related papers (2024-05-18T12:16:01Z) - INDUS: Effective and Efficient Language Models for Scientific Applications [8.76933154920986]
Large language models (LLMs) trained on general domain corpora showed remarkable results on natural language processing (NLP) tasks.
Previous research demonstrated LLMs trained using domain-focused corpora perform better on specialized tasks.
We developed INDUS, a comprehensive suite of LLMs tailored for the Earth science, biology, physics, heliophysics, planetary sciences and astrophysics domains.
arXiv Detail & Related papers (2024-05-17T12:15:07Z) - Multi-Task Learning for Front-End Text Processing in TTS [15.62497569424995]
We propose a multi-task learning (MTL) model for jointly performing three tasks that are commonly solved in a text-to-speech front-end.
Our framework utilizes a tree-like structure with a trunk that learns shared representations, followed by separate task-specific heads.
arXiv Detail & Related papers (2024-01-12T02:13:21Z) - LAMM: Language-Assisted Multi-Modal Instruction-Tuning Dataset,
Framework, and Benchmark [81.42376626294812]
We present Language-Assisted Multi-Modal instruction tuning dataset, framework, and benchmark.
Our aim is to establish LAMM as a growing ecosystem for training and evaluating MLLMs.
We present a comprehensive dataset and benchmark, which cover a wide range of vision tasks for 2D and 3D vision.
arXiv Detail & Related papers (2023-06-11T14:01:17Z) - Label-Free Multi-Domain Machine Translation with Stage-wise Training [13.144729358707206]
We propose a label-free multi-domain machine translation model which requires only a few or no domain-annotated data in training and no domain labels in inference.
Our model is composed of three parts: a backbone model, a domain discriminator taking responsibility to discriminate data from different domains, and a set of experts that transfer the decoded features from generic to specific.
arXiv Detail & Related papers (2023-05-06T06:30:29Z) - Pre-trained Language Models for Keyphrase Generation: A Thorough
Empirical Study [76.52997424694767]
We present an in-depth empirical study of keyphrase extraction and keyphrase generation using pre-trained language models.
We show that PLMs have competitive high-resource performance and state-of-the-art low-resource performance.
Further results show that in-domain BERT-like PLMs can be used to build strong and data-efficient keyphrase generation models.
arXiv Detail & Related papers (2022-12-20T13:20:21Z) - ME-D2N: Multi-Expert Domain Decompositional Network for Cross-Domain
Few-Shot Learning [95.78635058475439]
Cross-Domain Few-Shot Learning aims at addressing the Few-Shot Learning problem across different domains.
This paper technically contributes a novel Multi-Expert Domain Decompositional Network (ME-D2N)
We present a novel domain decomposition module that learns to decompose the student model into two domain-related sub parts.
arXiv Detail & Related papers (2022-10-11T09:24:47Z) - Multimodal Masked Autoencoders Learn Transferable Representations [127.35955819874063]
We propose a simple and scalable network architecture, the Multimodal Masked Autoencoder (M3AE)
M3AE learns a unified encoder for both vision and language data via masked token prediction.
We provide an empirical study of M3AE trained on a large-scale image-text dataset, and find that M3AE is able to learn generalizable representations that transfer well to downstream tasks.
arXiv Detail & Related papers (2022-05-27T19:09:42Z) - Building a Multi-domain Neural Machine Translation Model using Knowledge
Distillation [0.0]
Lack of specialized data makes building a multi-domain neural machine translation tool challenging.
We propose a new training pipeline where knowledge distillation and multiple specialized teachers allow us to efficiently finetune a model.
arXiv Detail & Related papers (2020-04-15T20:21:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.