LAB: Large-Scale Alignment for ChatBots
- URL: http://arxiv.org/abs/2403.01081v3
- Date: Mon, 29 Apr 2024 18:55:34 GMT
- Title: LAB: Large-Scale Alignment for ChatBots
- Authors: Shivchander Sudalairaj, Abhishek Bhandwaldar, Aldo Pareja, Kai Xu, David D. Cox, Akash Srivastava,
- Abstract summary: LAB (Large-scale Alignment for chatBots) is a novel methodology designed to overcome the scalability challenges in the instruction-tuning phase of large language model (LLM) training.
We demonstrate that LAB-trained models can achieve competitive performance across several benchmarks compared to models trained with traditional human-annotated or GPT-4 generated synthetic data.
- Score: 13.885153809482006
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This work introduces LAB (Large-scale Alignment for chatBots), a novel methodology designed to overcome the scalability challenges in the instruction-tuning phase of large language model (LLM) training. Leveraging a taxonomy-guided synthetic data generation process and a multi-phase tuning framework, LAB significantly reduces reliance on expensive human annotations and proprietary models like GPT-4. We demonstrate that LAB-trained models can achieve competitive performance across several benchmarks compared to models trained with traditional human-annotated or GPT-4 generated synthetic data. Thus offering a scalable, cost-effective solution for enhancing LLM capabilities and instruction-following behaviors without the drawbacks of catastrophic forgetting, marking a step forward in the efficient training of LLMs for a wide range of applications.
Related papers
- Forewarned is Forearmed: Leveraging LLMs for Data Synthesis through Failure-Inducing Exploration [90.41908331897639]
Large language models (LLMs) have significantly benefited from training on diverse, high-quality task-specific data.
We present a novel approach, ReverseGen, designed to automatically generate effective training samples.
arXiv Detail & Related papers (2024-10-22T06:43:28Z) - Embedding Self-Correction as an Inherent Ability in Large Language Models for Enhanced Mathematical Reasoning [13.082135438792475]
Chain of Self-Correction embeds self-correction as an inherent ability in Large Language Models (LLMs)
CoSC operates through a sequence of self-correction stages. In each stage, the LLMs generate a program to address a given problem, execute this program using program-based tools to obtain an output, subsequently verify this output.
In the first phase, the LLMs are trained with a relatively small volume of seeding data generated from GPT-4, establishing an initial CoSC capability.
In the second phase, the CoSC capability is further enhanced by training with a larger volume of self-generated data using
arXiv Detail & Related papers (2024-10-14T17:16:44Z) - Mitigating Training Imbalance in LLM Fine-Tuning via Selective Parameter Merging [11.223074654129915]
Supervised fine-tuning (SFT) is crucial for adapting Large Language Models (LLMs) to specific tasks.
We propose to mitigate this imbalance by merging SFT models fine-tuned with different data orders.
arXiv Detail & Related papers (2024-10-01T08:44:31Z) - Reference Trustable Decoding: A Training-Free Augmentation Paradigm for Large Language Models [79.41139393080736]
Large language models (LLMs) have rapidly advanced and demonstrated impressive capabilities.
We propose Reference Trustable Decoding (RTD), a paradigm that allows models to quickly adapt to new tasks without fine-tuning.
arXiv Detail & Related papers (2024-09-30T10:48:20Z) - Self-training Large Language Models through Knowledge Detection [26.831873737733737]
Large language models (LLMs) often necessitate extensive labeled datasets and training compute to achieve impressive performance across downstream tasks.
This paper explores a self-training paradigm, where the LLM autonomously curates its own labels and selectively trains on unknown data samples.
Empirical evaluations demonstrate significant improvements in reducing hallucination in generation across multiple subjects.
arXiv Detail & Related papers (2024-06-17T07:25:09Z) - GPTA: Generative Prompt Tuning Assistant for Synergistic Downstream Neural Network Enhancement with LLMs [11.572835837392867]
This study introduces GPTA, a Large Language Model assistance training framework, that enhances the training of downstream task models via prefix prompt.
By minimizing data exposure to LLM, the framework addresses the security and legal challenges of applying LLM in downstream task model training.
arXiv Detail & Related papers (2024-03-29T23:04:04Z) - Parameter-Efficient Sparsity Crafting from Dense to Mixture-of-Experts for Instruction Tuning on General Tasks [5.536630285985836]
We introduce parameter-efficient sparsity crafting (PESC)
PESC crafts dense models into sparse models using the mixture-of-experts (MoE) architecture.
Our best sparse model outperforms other sparse and dense models and exhibits superior general capabilities compared to GP3.5.
arXiv Detail & Related papers (2024-01-05T09:58:09Z) - Retrieval-based Knowledge Transfer: An Effective Approach for Extreme
Large Language Model Compression [64.07696663255155]
Large-scale pre-trained language models (LLMs) have demonstrated exceptional performance in various natural language processing (NLP) tasks.
However, the massive size of these models poses huge challenges for their deployment in real-world applications.
We introduce a novel compression paradigm called Retrieval-based Knowledge Transfer (RetriKT) which effectively transfers the knowledge of LLMs to extremely small-scale models.
arXiv Detail & Related papers (2023-10-24T07:58:20Z) - SALMON: Self-Alignment with Instructable Reward Models [80.83323636730341]
This paper presents a novel approach, namely SALMON, to align base language models with minimal human supervision.
We develop an AI assistant named Dromedary-2 with only 6 exemplars for in-context learning and 31 human-defined principles.
arXiv Detail & Related papers (2023-10-09T17:56:53Z) - Automatic Pair Construction for Contrastive Post-training [57.57149781848383]
In this paper, we propose an automatic way to construct contrastive data for large language models (LLMs)
We compare the contrastive techniques of SLiC and DPO to SFT baselines and find that DPO provides a step-function improvement even after continuing SFT saturates.
We also explore a data curriculum learning scheme for contrastive post-training, which starts by learning from "easier" pairs and transitioning to "harder" ones.
arXiv Detail & Related papers (2023-10-03T17:59:46Z) - Mixture-of-Experts Meets Instruction Tuning:A Winning Combination for
Large Language Models [125.91897197446379]
We find that MoE models benefit more from instruction tuning than dense models.
Our most powerful model, FLAN-MOE-32B, surpasses the performance of FLAN-PALM-62B on four benchmark tasks.
arXiv Detail & Related papers (2023-05-24T04:22:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.