Related papers: Aligners: Decoupling LLMs and Alignment

Aligners: Decoupling LLMs and Alignment

URL: http://arxiv.org/abs/2403.04224v4
Date: Fri, 04 Oct 2024 05:29:18 GMT
Title: Aligners: Decoupling LLMs and Alignment
Authors: Lilian Ngweta, Mayank Agarwal, Subha Maity, Alex Gittens, Yuekai Sun, Mikhail Yurochkin,
Abstract summary: Large Language Models (LLMs) need to be aligned with human expectations to ensure their safety and utility in most applications. We propose to decouple LLMs and alignment by training aligner models that can be used to align any LLM for a given criteria on an as-needed basis.
Score: 47.00002038331952
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Models (LLMs) need to be aligned with human expectations to ensure their safety and utility in most applications. Alignment is challenging, costly, and needs to be repeated for every LLM and alignment criterion. We propose to decouple LLMs and alignment by training aligner models that can be used to align any LLM for a given criteria on an as-needed basis, thus also reducing the potential negative impacts of alignment on performance. Our recipe for training the aligner models solely relies on synthetic data generated with a (prompted) LLM and can be easily adjusted for a variety of alignment criteria. We use the same synthetic data to train inspectors, binary miss-alignment classification models to guide a "squad" of multiple aligners. Our empirical results demonstrate consistent improvements when applying aligner squad to various LLMs, including chat-aligned models, across several instruction-following and red-teaming datasets.

Related papers

Controlled Diversity: Length-optimized Natural Language Generation [1.3888744377495608]
LLMs are not generally able to adjust the length of their outputs based on strict length requirements. We present an approach to train LLMs to acquire this capability by augmenting existing data and applying existing fine-tuning techniques. Our results indicate that our method may change the response quality when using training data that was not generated by the baseline model.
arXiv Detail & Related papers (2025-02-26T17:38:58Z)
Smoothie: Label Free Language Model Routing [39.88041397482366]
Large language models (LLMs) are increasingly used in applications where LLM inputs may span many different tasks. We propose Smoothie, a weak supervision-inspired routing approach that requires no labeled data. We find that Smoothie's LLM quality-scores correlate with ground-truth model quality.
arXiv Detail & Related papers (2024-12-06T01:06:37Z)
Align$^2$LLaVA: Cascaded Human and Large Language Model Preference Alignment for Multi-modal Instruction Curation [56.75665429851673]
This paper introduces a novel instruction curation algorithm, derived from two unique perspectives, human and LLM preference alignment. Experiments demonstrate that we can maintain or even improve model performance by compressing synthetic multimodal instructions by up to 90%.
arXiv Detail & Related papers (2024-09-27T08:20:59Z)
SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning [70.21358720599821]
Large language models (LLMs) hold the promise of solving diverse tasks when provided with appropriate natural language prompts. We propose SELF-GUIDE, a multi-stage mechanism in which we synthesize task-specific input-output pairs from the student LLM. We report an absolute improvement of approximately 15% for classification tasks and 18% for generation tasks in the benchmark's metrics.
arXiv Detail & Related papers (2024-07-16T04:41:58Z)
From Distributional to Overton Pluralism: Investigating Large Language Model Alignment [82.99849359892112]
We re-examine previously reported reductions in response diversity post-alignment. Our analysis suggests that an apparent drop in the diversity of responses is largely explained by quality control and information aggregation. Findings indicate that current alignment techniques capture but do not extend the useful subset of assistant-like base LLM behavior.
arXiv Detail & Related papers (2024-06-25T16:32:33Z)
Delta-CoMe: Training-Free Delta-Compression with Mixed-Precision for Large Language Models [79.46938238953916]
Fine-tuning large language models (LLMs) to diverse applications is crucial to meet complex demands. Recent studies suggest decomposing a fine-tuned LLM into a base model and corresponding delta weights, which are then compressed using low-rank or low-bit approaches to reduce costs. In this work, we observe that existing low-rank and low-bit compression methods can significantly harm the model performance for task-specific fine-tuned LLMs.
arXiv Detail & Related papers (2024-06-13T07:57:27Z)
A Practice-Friendly LLM-Enhanced Paradigm with Preference Parsing for Sequential Recommendation [15.153844486572932]
This paper proposes a practice-friendly LLM-enhanced paradigm with preference parsing (P2Rec) for sequential recommender systems (SRS) Specifically, in the information reconstruction stage, we design a new user-level SFT task for collaborative information injection with the assistance of a pre-trained SRS model. Our goal is to let LLM learn to reconstruct a corresponding prior preference distribution from each user's interaction sequence.
arXiv Detail & Related papers (2024-06-01T07:18:56Z)
Automated Data Curation for Robust Language Model Fine-Tuning [13.8454385440986]
We introduce an automated data curation pipeline CLEAR for instruction tuning datasets. CLEAR estimates which training data is low-quality and either filters or corrects it. Experiments reveal that CLEAR consistently improves the performance of fine-tuned models across many datasets and models.
arXiv Detail & Related papers (2024-03-19T14:44:45Z)
The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context Learning [61.68787689234622]
A recent study, LIMA, shows that using merely 1K examples for alignment tuning can achieve significant alignment performance as well. This raises questions about how exactly the alignment tuning transforms a base LLM. We show that the gap between tuning-free and tuning-based alignment methods can be significantly reduced through strategic prompting.
arXiv Detail & Related papers (2023-12-04T00:46:11Z)
Small Language Models Improve Giants by Rewriting Their Outputs [18.025736098795296]
We tackle the problem of leveraging training data to improve the performance of large language models (LLMs) without fine-tuning. We create a pool of candidates from the LLM through few-shot prompting and we employ a compact model, the LM-corrector (LMCor), specifically trained to merge these candidates to produce an enhanced output. Experiments on four natural language generation tasks demonstrate that even a small LMCor model (250M) substantially improves the few-shot performance of LLMs (62B), matching and even outperforming standard fine-tuning.
arXiv Detail & Related papers (2023-05-22T22:07:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.