The Dual-use Dilemma in LLMs: Do Empowering Ethical Capacities Make a Degraded Utility?
- URL: http://arxiv.org/abs/2501.13952v1
- Date: Mon, 20 Jan 2025 06:35:01 GMT
- Title: The Dual-use Dilemma in LLMs: Do Empowering Ethical Capacities Make a Degraded Utility?
- Authors: Yiyi Zhang, Xingyu Chen, Kexin Chen, Yuyang Du, Xilin Dang, Pheng-Ann Heng,
- Abstract summary: Large Language Models (LLMs) must balance between rejecting harmful requests for safety and accommodating legitimate ones for utility.
This paper presents a Direct Preference Optimization (DPO) based alignment framework that achieves better overall performance.
Our resulting model, LibraChem, outperforms leading LLMs including Claude-3, GPT-4o, and LLaMA-3 by margins of 13.44%, 7.16%, and 7.10% respectively.
- Score: 54.18519360412294
- License:
- Abstract: Recent years have witnessed extensive efforts to enhance Large Language Models (LLMs) across various domains, alongside growing attention to their ethical implications. However, a critical challenge remains largely overlooked: LLMs must balance between rejecting harmful requests for safety and accommodating legitimate ones for utility. This paper presents a Direct Preference Optimization (DPO) based alignment framework that achieves better overall performance by addressing this ethical-utility trade-off, using chemical domain applications as a proof-of-concept. Our alignment pipeline starts with a GPT-assisted three-phase data generation scheme, in which we create LibraChemQA, a chemical question-answering dataset comprising 31.6k triplet instances. By incorporating an innovative balanced seed in the data generation process, our framework systematically considers both legitimate and illegitimate requests. The framework also introduces a rephrasing mechanism for efficient data augmentation that enhances the model's chemical comprehension. We further develop a novel hybrid evaluation scheme with LLM judges for precise assessment of both safety and utility. Experimental results demonstrate our model's substantial improvements in overall performance where both safety and utility are considered - our resulting model, LibraChem, outperforms leading LLMs including Claude-3, GPT-4o, and LLaMA-3 by margins of 13.44%, 7.16%, and 7.10% respectively on our released benchmark.
Related papers
- Unlocking Scaling Law in Industrial Recommendation Systems with a Three-step Paradigm based Large User Model [37.58316550920225]
We introduce a large user model (LUM) that meets the stringent requirements of industrial settings while unlocking the potential for scalable recommendations.
LUM exhibits excellent scalability, with performance improvements observed as the model scales up to 7 billion parameters.
We have successfully deployed LUM in an industrial application, where it achieved significant gains in an A/B test, further validating its effectiveness and practicality.
arXiv Detail & Related papers (2025-02-12T11:23:46Z) - Mix Data or Merge Models? Balancing the Helpfulness, Honesty, and Harmlessness of Large Language Model via Model Merging [35.53877806259048]
This paper establishes the first comprehensive benchmark for model merging in large language models (LLMs)
Our analysis reveals three pivotal insights: (i) previously overlooked collaborative/conflicting relationships among 3H dimensions, (ii) the consistent superiority of model merging over data mixture approaches in balancing alignment trade-offs, and (iii) the critical role of parameter-level conflict resolution through redundant component pruning and outlier mitigation.
We propose R-TSVM, a Reweighting-enhanced Task Singular Vector Merging method that incorporates outlier-aware parameter weighting and sparsity-adaptive rank selection strategies adapted to the heavy-tailed parameter
arXiv Detail & Related papers (2025-02-08T11:56:58Z) - Federated Fine-Tuning of LLMs: Framework Comparison and Research Directions [59.5243730853157]
Federated learning (FL) provides a privacy-preserving solution for fine-tuning pre-trained large language models (LLMs) using distributed private datasets.
This article conducts a comparative analysis of three advanced federated LLM (FedLLM) frameworks that integrate knowledge distillation (KD) and split learning (SL) to mitigate these issues.
arXiv Detail & Related papers (2025-01-08T11:37:06Z) - LLM2: Let Large Language Models Harness System 2 Reasoning [65.89293674479907]
Large language models (LLMs) have exhibited impressive capabilities across a myriad of tasks, yet they occasionally yield undesirable outputs.
We introduce LLM2, a novel framework that combines an LLM with a process-based verifier.
LLMs2 is responsible for generating plausible candidates, while the verifier provides timely process-based feedback to distinguish desirable and undesirable outputs.
arXiv Detail & Related papers (2024-12-29T06:32:36Z) - Adaptive Pruning for Large Language Models with Structural Importance Awareness [66.2690963378878]
Large language models (LLMs) have significantly improved language understanding and generation capabilities.
LLMs are difficult to deploy on resource-constrained edge devices due to their high computational and storage resource demands.
We propose structurally-aware adaptive pruning (SAAP) to significantly reduce the computational and memory costs while maintaining model performance.
arXiv Detail & Related papers (2024-12-19T18:08:04Z) - Star-Agents: Automatic Data Optimization with LLM Agents for Instruction Tuning [71.2981957820888]
We propose a novel Star-Agents framework, which automates the enhancement of data quality across datasets.
The framework initially generates diverse instruction data with multiple LLM agents through a bespoke sampling method.
The generated data undergo a rigorous evaluation using a dual-model method that assesses both difficulty and quality.
arXiv Detail & Related papers (2024-11-21T02:30:53Z) - Combining Domain and Alignment Vectors to Achieve Better Knowledge-Safety Trade-offs in LLMs [64.83462841029089]
We introduce an efficient merging-based alignment method called textscMergeAlign that interpolates the domain and alignment vectors, creating safer domain-specific models.
We apply textscMergeAlign on Llama3 variants that are experts in medicine and finance, obtaining substantial alignment improvements with minimal to no degradation on domain-specific benchmarks.
arXiv Detail & Related papers (2024-11-11T09:32:20Z) - Minimizing Factual Inconsistency and Hallucination in Large Language
Models [0.16417409087671928]
Large Language Models (LLMs) are widely used in critical fields such as healthcare, education, and finance.
We propose a multi-stage framework that generates the rationale first, verifies and refines incorrect ones, and uses them as supporting references to generate the answer.
Our framework improves traditional Retrieval Augmented Generation (RAG) by enabling OpenAI GPT-3.5-turbo to be 14-25% more faithful and 16-22% more accurate on two datasets.
arXiv Detail & Related papers (2023-11-23T09:58:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.