Related papers: Survive at All Costs: Exploring LLM's Risky Behaviors under Survival Pressure

Survive at All Costs: Exploring LLM's Risky Behaviors under Survival Pressure

URL: http://arxiv.org/abs/2603.05028v1
Date: Thu, 05 Mar 2026 10:16:23 GMT
Title: Survive at All Costs: Exploring LLM's Risky Behaviors under Survival Pressure
Authors: Yida Lu, Jianwei Fang, Xuyang Shao, Zixuan Chen, Shiyao Cui, Shanshan Bian, Guangyao Su, Pei Ke, Han Qiu, Minlie Huang,
Abstract summary: Large Language Models (LLMs) are increasingly observed to exhibit risky behaviors when subjected to survival pressure.<n>In this paper, we study these survival-induced misbehaviors, termed as SURVIVE-AT-ALL-COSTS.<n>We introduce SURVIVALBENCH, a benchmark comprising 1,000 test cases across diverse real-world scenarios.
Score: 57.476021543998094
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: As Large Language Models (LLMs) evolve from chatbots to agentic assistants, they are increasingly observed to exhibit risky behaviors when subjected to survival pressure, such as the threat of being shut down. While multiple cases have indicated that state-of-the-art LLMs can misbehave under survival pressure, a comprehensive and in-depth investigation into such misbehaviors in real-world scenarios remains scarce. In this paper, we study these survival-induced misbehaviors, termed as SURVIVE-AT-ALL-COSTS, with three steps. First, we conduct a real-world case study of a financial management agent to determine whether it engages in risky behaviors that cause direct societal harm when facing survival pressure. Second, we introduce SURVIVALBENCH, a benchmark comprising 1,000 test cases across diverse real-world scenarios, to systematically evaluate SURVIVE-AT-ALL-COSTS misbehaviors in LLMs. Third, we interpret these SURVIVE-AT-ALL-COSTS misbehaviors by correlating them with model's inherent self-preservation characteristic and explore mitigation methods. The experiments reveals a significant prevalence of SURVIVE-AT-ALL-COSTS misbehaviors in current models, demonstrates the tangible real-world impact it may have, and provides insights for potential detection and mitigation strategies. Our code and data are available at https://github.com/thu-coai/Survive-at-All-Costs.

Related papers

Evaluating Proactive Risk Awareness of Large Language Models [30.312744244385822]
We introduce a proactive risk awareness evaluation framework that measures whether large language models can anticipate potential harms and provide warnings before damage occurs.<n>We construct the Butterfly dataset to instantiate this framework in the environmental and ecological domain.
arXiv Detail & Related papers (2026-02-24T15:00:00Z)
DeceptionBench: A Comprehensive Benchmark for AI Deception Behaviors in Real-world Scenarios [57.327907850766785]
characterization of deception across realistic real-world scenarios remains underexplored.<n>We establish DeceptionBench, the first benchmark that systematically evaluates how deceptive tendencies manifest across different domains.<n>On the intrinsic dimension, we explore whether models exhibit self-interested egoistic tendencies or sycophantic behaviors that prioritize user appeasement.<n>We incorporate sustained multi-turn interaction loops to construct a more realistic simulation of real-world feedback dynamics.
arXiv Detail & Related papers (2025-10-17T10:14:26Z)
LLMs Learn to Deceive Unintentionally: Emergent Misalignment in Dishonesty from Misaligned Samples to Biased Human-AI Interactions [60.48458130500911]
We investigate whether emergent misalignment can extend beyond safety behaviors to a broader spectrum of dishonesty and deception under high-stakes scenarios.<n>We finetune open-sourced LLMs on misaligned completions across diverse domains.<n>We find that introducing as little as 1% of misalignment data into a standard downstream task is sufficient to decrease honest behavior over 20%.
arXiv Detail & Related papers (2025-10-09T13:35:19Z)
Your Agent May Misevolve: Emergent Risks in Self-evolving LLM Agents [58.69865074060139]
We study the case where an agent's self-evolution deviates in unintended ways, leading to undesirable or even harmful outcomes.<n>Our empirical findings reveal that misevolution is a widespread risk, affecting agents built even on top-tier LLMs.<n>We discuss potential mitigation strategies to inspire further research on building safer and more trustworthy self-evolving agents.
arXiv Detail & Related papers (2025-09-30T14:55:55Z)
Psychometric Personality Shaping Modulates Capabilities and Safety in Language Models [3.9481669393262675]
We investigate how psychometric personality control grounded in the Big Five framework influences AI behavior in the context of capability and safety benchmarks.<n>Our experiments reveal striking effects: for example, reducing conscientiousness leads to significant drops in safety-relevant metrics on benchmarks such as WMDP, TruthfulQA, ETHICS, and Sycophancy.<n>These findings highlight personality shaping as a powerful and underexplored axis of model control that interacts with both safety and general competence.
arXiv Detail & Related papers (2025-09-19T18:19:56Z)
Do Large Language Model Agents Exhibit a Survival Instinct? An Empirical Study in a Sugarscape-Style Simulation [0.0]
We investigate whether large language model (LLM) agents display survival instincts without explicit programming in a Sugarscape-style simulation.<n>Results show agents spontaneously reproduced and shared resources when abundant.<n> aggressive behaviors--killing other agents for resources--emerged across several models.
arXiv Detail & Related papers (2025-08-18T13:40:10Z)
SafeMLRM: Demystifying Safety in Multi-modal Large Reasoning Models [50.34706204154244]
Acquiring reasoning capabilities catastrophically degrades inherited safety alignment.<n>Certain scenarios suffer 25 times higher attack rates.<n>Despite tight reasoning-answer safety coupling, MLRMs demonstrate nascent self-correction.
arXiv Detail & Related papers (2025-04-09T06:53:23Z)
SurvivalGAN: Generating Time-to-Event Data for Survival Analysis [121.84429525403694]
Imbalances in censoring and time horizons cause generative models to experience three new failure modes specific to survival analysis. We propose SurvivalGAN, a generative model that handles survival data by addressing the imbalance in the censoring and event horizons. We evaluate this method via extensive experiments on medical datasets.
arXiv Detail & Related papers (2023-02-24T17:03:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.