Related papers: How Controllable Are Large Language Models? A Unified Evaluation across Behavioral Granularities

How Controllable Are Large Language Models? A Unified Evaluation across Behavioral Granularities

URL: http://arxiv.org/abs/2603.02578v1
Date: Tue, 03 Mar 2026 03:50:13 GMT
Title: How Controllable Are Large Language Models? A Unified Evaluation across Behavioral Granularities
Authors: Ziwen Xu, Kewei Xu, Haoming Xu, Haiwen Hong, Longtao Huang, Hui Xue, Ningyu Zhang, Yongliang Shen, Guozhou Zheng, Huajun Chen, Shumin Deng,
Abstract summary: Large Language Models (LLMs) are increasingly deployed in socially sensitive domains.<n>Our benchmark offers a principled and interpretable framework for safe and controllable behavior.
Score: 75.10343190811592
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Models (LLMs) are increasingly deployed in socially sensitive domains, yet their unpredictable behaviors, ranging from misaligned intent to inconsistent personality, pose significant risks. We introduce SteerEval, a hierarchical benchmark for evaluating LLM controllability across three domains: language features, sentiment, and personality. Each domain is structured into three specification levels: L1 (what to express), L2 (how to express), and L3 (how to instantiate), connecting high-level behavioral intent to concrete textual output. Using SteerEval, we systematically evaluate contemporary steering methods, revealing that control often degrades at finer-grained levels. Our benchmark offers a principled and interpretable framework for safe and controllable LLM behavior, serving as a foundation for future research.

Related papers

VALUEFLOW: Toward Pluralistic and Steerable Value-based Alignment in Large Language Models [9.511622126333105]
VALUEFLOW is a framework that spans extraction, evaluation, and steering with calibrated intensity control.<n>We conduct a large-scale study across ten models and four value theories, identifying asymmetries in steerability and composition laws for multi-value control.
arXiv Detail & Related papers (2026-02-03T06:19:57Z)
Learned-Rule-Augmented Large Language Model Evaluators [5.4343364964031124]
Large language models (LLMs) are predominantly used as evaluators for natural language generation (NLG) tasks.<n>This work explores the potential of LLMs as general evaluators across diverse tasks.
arXiv Detail & Related papers (2025-12-01T18:08:45Z)
Pluralistic Behavior Suite: Stress-Testing Multi-Turn Adherence to Custom Behavioral Policies [18.428149174461264]
We present PBSUITE, a dynamic evaluation suite designed to assess large language models' capacity to adhere to pluralistic alignment specifications.<n>We find that leading open- and closed-source LLMs maintain robust adherence to behavioral policies in single-turn settings, but their compliance weakens substantially in multi-turn adversarial interactions.
arXiv Detail & Related papers (2025-11-07T06:43:01Z)
Dissecting Logical Reasoning in LLMs: A Fine-Grained Evaluation and Supervision Study [40.143148197878354]
We introduce FineLogic, a fine-grained evaluation framework that assesses logical reasoning across three dimensions.<n>We study how different supervision formats in fine-tuning shape reasoning abilities.<n>We find a key trade-off: natural language supervision excels at generalization, whereas symbolic supervision is superior at instilling structurally sound, atomic reasoning steps.
arXiv Detail & Related papers (2025-06-05T09:34:12Z)
Truly Assessing Fluid Intelligence of Large Language Models through Dynamic Reasoning Evaluation [106.17986469245302]
Large language models (LLMs) have demonstrated impressive reasoning capacities that mirror human-like thinking.<n>Existing reasoning benchmarks either focus on domain-specific knowledge (crystallized intelligence) or lack interpretability.<n>We propose DRE-Bench, a dynamic reasoning evaluation benchmark grounded in a hierarchical cognitive framework.
arXiv Detail & Related papers (2025-06-03T09:01:08Z)
Towards LLM Guardrails via Sparse Representation Steering [11.710399901426873]
Large Language Models (LLMs) have demonstrated remarkable performance in natural language generation tasks.<n>We propose a sparse encoding-based representation engineering method, named SRE, which decomposes polysemantic activations into a structured, monosemantic feature space.<n>By leveraging sparse autoencoding, our approach isolates and adjusts only task-specific sparse feature dimensions, enabling precise and interpretable steering of model behavior.
arXiv Detail & Related papers (2025-03-21T04:50:25Z)
Value Compass Benchmarks: A Platform for Fundamental and Validated Evaluation of LLMs Values [76.70893269183684]
Large Language Models (LLMs) achieve remarkable breakthroughs.<n> aligning their values with humans has become imperative for their responsible development.<n>There still lack evaluations of LLMs values that fulfill three desirable goals.
arXiv Detail & Related papers (2025-01-13T05:53:56Z)
RuleArena: A Benchmark for Rule-Guided Reasoning with LLMs in Real-World Scenarios [58.90106984375913]
RuleArena is a novel and challenging benchmark designed to evaluate the ability of large language models (LLMs) to follow complex, real-world rules in reasoning.<n> Covering three practical domains -- airline baggage fees, NBA transactions, and tax regulations -- RuleArena assesses LLMs' proficiency in handling intricate natural language instructions.
arXiv Detail & Related papers (2024-12-12T06:08:46Z)
DECIDER: A Dual-System Rule-Controllable Decoding Framework for Language Generation [57.07295906718989]
Constrained decoding approaches aim to control the meaning or style of text generated by pre-trained large language (Ms also PLMs) for various tasks at inference time.<n>These methods often guide plausible continuations by greedily and explicitly selecting targets.<n>Inspired by cognitive dual-process theory, we propose a novel decoding framework DECIDER.
arXiv Detail & Related papers (2024-03-04T11:49:08Z)
Improving Open Information Extraction with Large Language Models: A Study on Demonstration Uncertainty [52.72790059506241]
Open Information Extraction (OIE) task aims at extracting structured facts from unstructured text. Despite the potential of large language models (LLMs) like ChatGPT as a general task solver, they lag behind state-of-the-art (supervised) methods in OIE tasks.
arXiv Detail & Related papers (2023-09-07T01:35:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.