How Controllable Are Large Language Models? A Unified Evaluation across Behavioral Granularities
- URL: http://arxiv.org/abs/2603.02578v1
- Date: Tue, 03 Mar 2026 03:50:13 GMT
- Title: How Controllable Are Large Language Models? A Unified Evaluation across Behavioral Granularities
- Authors: Ziwen Xu, Kewei Xu, Haoming Xu, Haiwen Hong, Longtao Huang, Hui Xue, Ningyu Zhang, Yongliang Shen, Guozhou Zheng, Huajun Chen, Shumin Deng,
- Abstract summary: Large Language Models (LLMs) are increasingly deployed in socially sensitive domains.<n>Our benchmark offers a principled and interpretable framework for safe and controllable behavior.
- Score: 75.10343190811592
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLMs) are increasingly deployed in socially sensitive domains, yet their unpredictable behaviors, ranging from misaligned intent to inconsistent personality, pose significant risks. We introduce SteerEval, a hierarchical benchmark for evaluating LLM controllability across three domains: language features, sentiment, and personality. Each domain is structured into three specification levels: L1 (what to express), L2 (how to express), and L3 (how to instantiate), connecting high-level behavioral intent to concrete textual output. Using SteerEval, we systematically evaluate contemporary steering methods, revealing that control often degrades at finer-grained levels. Our benchmark offers a principled and interpretable framework for safe and controllable LLM behavior, serving as a foundation for future research.
Related papers
- VALUEFLOW: Toward Pluralistic and Steerable Value-based Alignment in Large Language Models [9.511622126333105]
VALUEFLOW is a framework that spans extraction, evaluation, and steering with calibrated intensity control.<n>We conduct a large-scale study across ten models and four value theories, identifying asymmetries in steerability and composition laws for multi-value control.
arXiv Detail & Related papers (2026-02-03T06:19:57Z) - Learned-Rule-Augmented Large Language Model Evaluators [5.4343364964031124]
Large language models (LLMs) are predominantly used as evaluators for natural language generation (NLG) tasks.<n>This work explores the potential of LLMs as general evaluators across diverse tasks.
arXiv Detail & Related papers (2025-12-01T18:08:45Z) - Pluralistic Behavior Suite: Stress-Testing Multi-Turn Adherence to Custom Behavioral Policies [18.428149174461264]
We present PBSUITE, a dynamic evaluation suite designed to assess large language models' capacity to adhere to pluralistic alignment specifications.<n>We find that leading open- and closed-source LLMs maintain robust adherence to behavioral policies in single-turn settings, but their compliance weakens substantially in multi-turn adversarial interactions.
arXiv Detail & Related papers (2025-11-07T06:43:01Z) - Dissecting Logical Reasoning in LLMs: A Fine-Grained Evaluation and Supervision Study [40.143148197878354]
We introduce FineLogic, a fine-grained evaluation framework that assesses logical reasoning across three dimensions.<n>We study how different supervision formats in fine-tuning shape reasoning abilities.<n>We find a key trade-off: natural language supervision excels at generalization, whereas symbolic supervision is superior at instilling structurally sound, atomic reasoning steps.
arXiv Detail & Related papers (2025-06-05T09:34:12Z) - Truly Assessing Fluid Intelligence of Large Language Models through Dynamic Reasoning Evaluation [106.17986469245302]
Large language models (LLMs) have demonstrated impressive reasoning capacities that mirror human-like thinking.<n>Existing reasoning benchmarks either focus on domain-specific knowledge (crystallized intelligence) or lack interpretability.<n>We propose DRE-Bench, a dynamic reasoning evaluation benchmark grounded in a hierarchical cognitive framework.
arXiv Detail & Related papers (2025-06-03T09:01:08Z) - Towards LLM Guardrails via Sparse Representation Steering [11.710399901426873]
Large Language Models (LLMs) have demonstrated remarkable performance in natural language generation tasks.<n>We propose a sparse encoding-based representation engineering method, named SRE, which decomposes polysemantic activations into a structured, monosemantic feature space.<n>By leveraging sparse autoencoding, our approach isolates and adjusts only task-specific sparse feature dimensions, enabling precise and interpretable steering of model behavior.
arXiv Detail & Related papers (2025-03-21T04:50:25Z) - Value Compass Benchmarks: A Platform for Fundamental and Validated Evaluation of LLMs Values [76.70893269183684]
Large Language Models (LLMs) achieve remarkable breakthroughs.<n> aligning their values with humans has become imperative for their responsible development.<n>There still lack evaluations of LLMs values that fulfill three desirable goals.
arXiv Detail & Related papers (2025-01-13T05:53:56Z) - RuleArena: A Benchmark for Rule-Guided Reasoning with LLMs in Real-World Scenarios [58.90106984375913]
RuleArena is a novel and challenging benchmark designed to evaluate the ability of large language models (LLMs) to follow complex, real-world rules in reasoning.<n> Covering three practical domains -- airline baggage fees, NBA transactions, and tax regulations -- RuleArena assesses LLMs' proficiency in handling intricate natural language instructions.
arXiv Detail & Related papers (2024-12-12T06:08:46Z) - DECIDER: A Dual-System Rule-Controllable Decoding Framework for Language Generation [57.07295906718989]
Constrained decoding approaches aim to control the meaning or style of text generated by pre-trained large language (Ms also PLMs) for various tasks at inference time.<n>These methods often guide plausible continuations by greedily and explicitly selecting targets.<n>Inspired by cognitive dual-process theory, we propose a novel decoding framework DECIDER.
arXiv Detail & Related papers (2024-03-04T11:49:08Z) - Improving Open Information Extraction with Large Language Models: A
Study on Demonstration Uncertainty [52.72790059506241]
Open Information Extraction (OIE) task aims at extracting structured facts from unstructured text.
Despite the potential of large language models (LLMs) like ChatGPT as a general task solver, they lag behind state-of-the-art (supervised) methods in OIE tasks.
arXiv Detail & Related papers (2023-09-07T01:35:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.