Related papers: Combining Domain and Alignment Vectors to Achieve Better Knowledge-Safety Trade-offs in LLMs

Combining Domain and Alignment Vectors to Achieve Better Knowledge-Safety Trade-offs in LLMs

URL: http://arxiv.org/abs/2411.06824v1
Date: Mon, 11 Nov 2024 09:32:20 GMT
Title: Combining Domain and Alignment Vectors to Achieve Better Knowledge-Safety Trade-offs in LLMs
Authors: Megh Thakkar, Yash More, Quentin Fournier, Matthew Riemer, Pin-Yu Chen, Amal Zouaq, Payel Das, Sarath Chandar,
Abstract summary: We introduce an efficient merging-based alignment method called textscMergeAlign that interpolates the domain and alignment vectors, creating safer domain-specific models. We apply textscMergeAlign on Llama3 variants that are experts in medicine and finance, obtaining substantial alignment improvements with minimal to no degradation on domain-specific benchmarks.
Score: 64.83462841029089
License: http://creativecommons.org/licenses/by/4.0/
Abstract: There is a growing interest in training domain-expert LLMs that excel in specific technical fields compared to their general-purpose instruction-tuned counterparts. However, these expert models often experience a loss in their safety abilities in the process, making them capable of generating harmful content. As a solution, we introduce an efficient and effective merging-based alignment method called \textsc{MergeAlign} that interpolates the domain and alignment vectors, creating safer domain-specific models while preserving their utility. We apply \textsc{MergeAlign} on Llama3 variants that are experts in medicine and finance, obtaining substantial alignment improvements with minimal to no degradation on domain-specific benchmarks. We study the impact of model merging through model similarity metrics and contributions of individual models being merged. We hope our findings open new research avenues and inspire more efficient development of safe expert LLMs.

Related papers

Injecting Domain-Specific Knowledge into Large Language Models: A Comprehensive Survey [39.82566660592583]
Large Language Models (LLMs) have demonstrated remarkable success in various tasks such as natural language understanding, text summarization, and machine translation. Their general-purpose nature often limits their effectiveness in domain-specific applications that require specialized knowledge, such as healthcare, chemistry, or legal analysis. To address this, researchers have explored diverse methods to enhance LLMs by integrating domain-specific knowledge.
arXiv Detail & Related papers (2025-02-15T07:43:43Z)
BANER: Boundary-Aware LLMs for Few-Shot Named Entity Recognition [12.57768435856206]
We propose an approach called Boundary-Aware LLMs for Few-Shot Named Entity Recognition. We introduce a boundary-aware contrastive learning strategy to enhance the LLM's ability to perceive entity boundaries for generalized entity spans. We utilize LoRAHub to align information from the target domain to the source domain, thereby enhancing adaptive cross-domain classification capabilities.
arXiv Detail & Related papers (2024-12-03T07:51:14Z)
Evaluating Large Language Models for Causal Modeling [1.5468177185307304]
We consider the process of transforming causal domain knowledge into a representation that aligns more closely with guidelines from causal data science. We introduce two novel tasks related to distilling causal domain knowledge into causal variables and detecting interaction entities using LLMs.
arXiv Detail & Related papers (2024-11-24T15:51:56Z)
LFME: A Simple Framework for Learning from Multiple Experts in Domain Generalization [61.16890890570814]
Domain generalization (DG) methods aim to maintain good performance in an unseen target domain by using training data from multiple source domains. This work introduces a simple yet effective framework, dubbed learning from multiple experts (LFME) that aims to make the target model an expert in all source domains to improve DG.
arXiv Detail & Related papers (2024-10-22T13:44:10Z)
Unconstrained Model Merging for Enhanced LLM Reasoning [42.079040543428036]
We explore the potential of merging multiple expert models into a single large language model. We propose an unconstrained model merging framework that accommodates both homogeneous and heterogeneous model architectures. Across 7 benchmarks and 9 reasoning-optimized LLMs, we reveal key findings that reasoning emerges from merging.
arXiv Detail & Related papers (2024-10-17T16:04:07Z)
Model-Based Differentially Private Knowledge Transfer for Large Language Models [34.949731264918846]
We propose textitLlamdex, a framework that integrates privacy-preserving, domain-specific models into large language models. Our approach significantly enhances the accuracy of domain-specific tasks, achieving up to a 26% improvement compared to existing methods.
arXiv Detail & Related papers (2024-10-14T13:18:20Z)
Model Merging and Safety Alignment: One Bad Model Spoils the Bunch [70.614652904151]
Merging Large Language Models (LLMs) is a cost-effective technique for combining multiple expert LLMs into a single versatile model. Current approaches often overlook the importance of safety alignment during merging, leading to highly misaligned models. We evaluate several popular model merging techniques, demonstrating that existing methods do not only transfer domain expertise but also propagate misalignment.
arXiv Detail & Related papers (2024-06-20T17:59:58Z)
Mixture of insighTful Experts (MoTE): The Synergy of Thought Chains and Expert Mixtures in Self-Alignment [103.05005690990271]
Traditional alignment strategies rely heavily on human intervention, such asSupervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF) We propose a novel self-alignment method that utilizes a Chain of Thought (CoT) approach, termed AlignCoT. We introduce the Mixture of insighTful Experts (MoTE) architecture, which applies mixture of experts to enhance each component of the AlignCoT process, markedly increasing alignment efficiency.
arXiv Detail & Related papers (2024-05-01T15:06:05Z)
BLADE: Enhancing Black-box Large Language Models with Small Domain-Specific Models [56.89958793648104]
Large Language Models (LLMs) are versatile and capable of addressing a diverse range of tasks. Previous approaches either conduct continuous pre-training with domain-specific data or employ retrieval augmentation to support general LLMs. We present a novel framework named BLADE, which enhances Black-box LArge language models with small Domain-spEcific models.
arXiv Detail & Related papers (2024-03-27T08:57:21Z)
ProgGen: Generating Named Entity Recognition Datasets Step-by-step with Self-Reflexive Large Language Models [25.68491572293656]
Large Language Models fall short in structured knowledge extraction tasks such as named entity recognition. This paper explores an innovative, cost-efficient strategy to harness LLMs with modest NER capabilities for producing superior NER datasets.
arXiv Detail & Related papers (2024-03-17T06:12:43Z)
LLM Inference Unveiled: Survey and Roofline Model Insights [62.92811060490876]
Large Language Model (LLM) inference is rapidly evolving, presenting a unique blend of opportunities and challenges. Our survey stands out from traditional literature reviews by not only summarizing the current state of research but also by introducing a framework based on roofline model. This framework identifies the bottlenecks when deploying LLMs on hardware devices and provides a clear understanding of practical problems.
arXiv Detail & Related papers (2024-02-26T07:33:05Z)
PANDA: Preference Adaptation for Enhancing Domain-Specific Abilities of LLMs [49.32067576992511]
Large language models often fall short of the performance achieved by domain-specific state-of-the-art models. One potential approach to enhance domain-specific capabilities of LLMs involves fine-tuning them using corresponding datasets. We propose Preference Adaptation for Enhancing Domain-specific Abilities of LLMs (PANDA) Our experimental results reveal that PANDA significantly enhances the domain-specific ability of LLMs on text classification and interactive decision tasks.
arXiv Detail & Related papers (2024-02-20T09:02:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.