Active Learning for Robust and Representative LLM Generation in Safety-Critical Scenarios
- URL: http://arxiv.org/abs/2410.11114v1
- Date: Mon, 14 Oct 2024 21:48:14 GMT
- Title: Active Learning for Robust and Representative LLM Generation in Safety-Critical Scenarios
- Authors: Sabit Hassan, Anthony Sicilia, Malihe Alikhani,
- Abstract summary: Large Language Models (LLMs) can generate valuable data for safety measures, but often exhibit distributional biases.
We propose a novel framework that integrates active learning with clustering to guide LLM generation.
Our results show that the proposed framework produces a more representative set of safety scenarios without requiring prior knowledge of the underlying data distribution.
- Score: 32.16984263644299
- License:
- Abstract: Ensuring robust safety measures across a wide range of scenarios is crucial for user-facing systems. While Large Language Models (LLMs) can generate valuable data for safety measures, they often exhibit distributional biases, focusing on common scenarios and neglecting rare but critical cases. This can undermine the effectiveness of safety protocols developed using such data. To address this, we propose a novel framework that integrates active learning with clustering to guide LLM generation, enhancing their representativeness and robustness in safety scenarios. We demonstrate the effectiveness of our approach by constructing a dataset of 5.4K potential safety violations through an iterative process involving LLM generation and an active learner model's feedback. Our results show that the proposed framework produces a more representative set of safety scenarios without requiring prior knowledge of the underlying data distribution. Additionally, data acquired through our method improves the accuracy and F1 score of both the active learner model as well models outside the scope of active learning process, highlighting its broad applicability.
Related papers
- Model Developmental Safety: A Safety-Centric Method and Applications in Vision-Language Models [75.8161094916476]
We study how to develop a pretrained vision-language model (aka the CLIP model) for acquiring new capabilities or improving existing capabilities of image classification.
Our experiments on improving vision perception capabilities on autonomous driving and scene recognition datasets demonstrate the efficacy of the proposed approach.
arXiv Detail & Related papers (2024-10-04T22:34:58Z) - Towards Inference-time Category-wise Safety Steering for Large Language Models [3.712541089289745]
Large language models (LLMs) have seen unprecedented advancements in capabilities and applications across a variety of use-cases.
The fragile nature of LLMs warrants additional safety steering steps via training-free, inference-time methods.
Unlike recent inference-time safety steering works, in this paper we explore safety steering of LLM outputs using category-specific steering vectors.
arXiv Detail & Related papers (2024-10-02T02:02:06Z) - Bi-Factorial Preference Optimization: Balancing Safety-Helpfulness in Language Models [94.39278422567955]
Fine-tuning large language models (LLMs) on human preferences has proven successful in enhancing their capabilities.
However, ensuring the safety of LLMs during the fine-tuning remains a critical concern.
We propose a supervised learning framework called Bi-Factorial Preference Optimization (BFPO) to address this issue.
arXiv Detail & Related papers (2024-08-27T17:31:21Z) - Diffusion Models for Offline Multi-agent Reinforcement Learning with Safety Constraints [0.0]
We introduce an innovative framework integrating diffusion models within the Multi-agent Reinforcement Learning paradigm.
This approach notably enhances the safety of actions taken by multiple agents through risk mitigation while modeling coordinated action.
arXiv Detail & Related papers (2024-06-30T16:05:31Z) - Unveiling the Misuse Potential of Base Large Language Models via In-Context Learning [61.2224355547598]
Open-sourcing of large language models (LLMs) accelerates application development, innovation, and scientific progress.
Our investigation exposes a critical oversight in this belief.
By deploying carefully designed demonstrations, our research demonstrates that base LLMs could effectively interpret and execute malicious instructions.
arXiv Detail & Related papers (2024-04-16T13:22:54Z) - Empowering Autonomous Driving with Large Language Models: A Safety Perspective [82.90376711290808]
This paper explores the integration of Large Language Models (LLMs) into Autonomous Driving systems.
LLMs are intelligent decision-makers in behavioral planning, augmented with a safety verifier shield for contextual safety learning.
We present two key studies in a simulated environment: an adaptive LLM-conditioned Model Predictive Control (MPC) and an LLM-enabled interactive behavior planning scheme with a state machine.
arXiv Detail & Related papers (2023-11-28T03:13:09Z) - Safety-Critical Scenario Generation Via Reinforcement Learning Based
Editing [20.99962858782196]
We propose a deep reinforcement learning approach that generates safety-critical scenarios by sequential editing.
Our framework employs a reward function consisting of both risk and plausibility objectives.
Our evaluation demonstrates that the proposed method generates safety-critical scenarios of higher quality compared with previous approaches.
arXiv Detail & Related papers (2023-06-25T05:15:25Z) - SAMBA: Safe Model-Based & Active Reinforcement Learning [59.01424351231993]
SAMBA is a framework for safe reinforcement learning that combines aspects from probabilistic modelling, information theory, and statistics.
We evaluate our algorithm on a variety of safe dynamical system benchmarks involving both low and high-dimensional state representations.
We provide intuition as to the effectiveness of the framework by a detailed analysis of our active metrics and safety constraints.
arXiv Detail & Related papers (2020-06-12T10:40:46Z) - Chance-Constrained Trajectory Optimization for Safe Exploration and
Learning of Nonlinear Systems [81.7983463275447]
Learning-based control algorithms require data collection with abundant supervision for training.
We present a new approach for optimal motion planning with safe exploration that integrates chance-constrained optimal control with dynamics learning and feedback control.
arXiv Detail & Related papers (2020-05-09T05:57:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.