Related papers: From Literal to Liberal: A Meta-Prompting Framework for Eliciting Human-Aligned Exception Handling in Large Language Models

From Literal to Liberal: A Meta-Prompting Framework for Eliciting Human-Aligned Exception Handling in Large Language Models

URL: http://arxiv.org/abs/2510.12864v1
Date: Tue, 14 Oct 2025 16:42:52 GMT
Title: From Literal to Liberal: A Meta-Prompting Framework for Eliciting Human-Aligned Exception Handling in Large Language Models
Authors: Imran Khan,
Abstract summary: Large Language Models (LLMs) are increasingly being deployed as the reasoning engines for agentic AI systems.<n>They exhibit a critical flaw: a rigid adherence to explicit rules that leads to decisions misaligned with human common sense and intent.<n>We introduce the Rule-Intent Distinction (RID) Framework, which elicits human-aligned exception handling in LLMs in a zero-shot manner.
Score: 0.3946915822335988
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) are increasingly being deployed as the reasoning engines for agentic AI systems, yet they exhibit a critical flaw: a rigid adherence to explicit rules that leads to decisions misaligned with human common sense and intent. This "rule-rigidity" is a significant barrier to building trustworthy autonomous agents. While prior work has shown that supervised fine-tuning (SFT) with human explanations can mitigate this issue, SFT is computationally expensive and inaccessible to many practitioners. To address this gap, we introduce the Rule-Intent Distinction (RID) Framework, a novel, low-compute meta-prompting technique designed to elicit human-aligned exception handling in LLMs in a zero-shot manner. The RID framework provides the model with a structured cognitive schema for deconstructing tasks, classifying rules, weighing conflicting outcomes, and justifying its final decision. We evaluated the RID framework against baseline and Chain-of-Thought (CoT) prompting on a custom benchmark of 20 scenarios requiring nuanced judgment across diverse domains. Our human-verified results demonstrate that the RID framework significantly improves performance, achieving a 95% Human Alignment Score (HAS), compared to 80% for the baseline and 75% for CoT. Furthermore, it consistently produces higher-quality, intent-driven reasoning. This work presents a practical, accessible, and effective method for steering LLMs from literal instruction-following to liberal, goal-oriented reasoning, paving the way for more reliable and pragmatic AI agents.

Related papers

How Controllable Are Large Language Models? A Unified Evaluation across Behavioral Granularities [75.10343190811592]
Large Language Models (LLMs) are increasingly deployed in socially sensitive domains.<n>Our benchmark offers a principled and interpretable framework for safe and controllable behavior.
arXiv Detail & Related papers (2026-03-03T03:50:13Z)
Maestro: Learning to Collaborate via Conditional Listwise Policy Optimization for Multi-Agent LLMs [23.590034731179824]
We present Through Role Orchestration (Maestro), a principled paradigm for collaboration that structurally decouples cognitive modes.<n>Maestro uses a collective of parallel Execution Agents for diverse exploration and a specialized Central Agent for convergent, evaluative synthesis.<n>Experiments on mathematical reasoning and general problem-solving benchmarks demonstrate that Maestro, coupled with CLPO, consistently outperforms existing state-of-the-art multi-agent approaches.
arXiv Detail & Related papers (2025-11-08T21:01:27Z)
From <Answer> to <Think>: Multidimensional Supervision of Reasoning Process for LLM Optimization [62.07990937720985]
Dimension-level Reward Model (DRM) is a new supervision framework for Large Language Models.<n>DRM evaluates the quality of a reasoning process along three fundamental, complementary, and interpretable dimensions.<n> Experimental results show that DRM provides effective supervision signals, guides the optimization of LLMs and enhances their reasoning ability.
arXiv Detail & Related papers (2025-10-13T14:29:15Z)
LLM-Guided Semantic Relational Reasoning for Multimodal Intent Recognition [14.683883775425821]
This paper proposes a novel method for understanding human intents from multimodal signals.<n>The method harnesses the expansive knowledge of large language models (LLMs) to establish semantic foundations.<n>Experiments on multimodal intent and dialogue act tasks demonstrate LGSRR's superiority over state-of-the-art methods.
arXiv Detail & Related papers (2025-09-01T10:18:47Z)
When Thinking LLMs Lie: Unveiling the Strategic Deception in Representations of Reasoning Models [9.05950721565821]
We study strategic deception in large language models (LLMs)<n>We induce, detect, and control such deception in CoT-enabled LLMs.<n>We achieve a 40% success rate in eliciting context-appropriate deception without explicit prompts.
arXiv Detail & Related papers (2025-06-05T11:44:19Z)
Teaching AI to Handle Exceptions: Supervised Fine-Tuning with Human-Aligned Judgment [0.0]
Large language models (LLMs) are evolving into agentic AI systems, but their decision-making processes remain poorly understood.<n>We show that even LLMs that excel at reasoning deviate significantly from human judgments because they adhere strictly to policies.<n>We then evaluate three approaches to tuning AI agents to handle exceptions: ethical framework prompting, chain-of-thought reasoning, and supervised fine-tuning.
arXiv Detail & Related papers (2025-03-04T20:00:37Z)
Proof of Thought : Neurosymbolic Program Synthesis allows Robust and Interpretable Reasoning [1.3003982724617653]
Large Language Models (LLMs) have revolutionized natural language processing, yet they struggle with inconsistent reasoning. This research introduces Proof of Thought, a framework that enhances the reliability and transparency of LLM outputs. Key contributions include a robust type system with sort management for enhanced logical integrity, explicit representation of rules for clear distinction between factual and inferential knowledge.
arXiv Detail & Related papers (2024-09-25T18:35:45Z)
Tuning-Free Accountable Intervention for LLM Deployment -- A Metacognitive Approach [55.613461060997004]
Large Language Models (LLMs) have catalyzed transformative advances across a spectrum of natural language processing tasks. We propose an innovative textitmetacognitive approach, dubbed textbfCLEAR, to equip LLMs with capabilities for self-aware error identification and correction.
arXiv Detail & Related papers (2024-03-08T19:18:53Z)
DECIDER: A Dual-System Rule-Controllable Decoding Framework for Language Generation [57.07295906718989]
Constrained decoding approaches aim to control the meaning or style of text generated by pre-trained large language (Ms also PLMs) for various tasks at inference time.<n>These methods often guide plausible continuations by greedily and explicitly selecting targets.<n>Inspired by cognitive dual-process theory, we propose a novel decoding framework DECIDER.
arXiv Detail & Related papers (2024-03-04T11:49:08Z)
DeAL: Decoding-time Alignment for Large Language Models [58.368979253590794]
Large Language Models (LLMs) are nowadays expected to generate content aligned with human preferences.<n>We propose DeAL, a framework that allows the user to customize reward functions and enables Decoding-time alignment of LLMs.
arXiv Detail & Related papers (2024-02-05T06:12:29Z)
SALMON: Self-Alignment with Instructable Reward Models [80.83323636730341]
This paper presents a novel approach, namely SALMON, to align base language models with minimal human supervision. We develop an AI assistant named Dromedary-2 with only 6 exemplars for in-context learning and 31 human-defined principles.
arXiv Detail & Related papers (2023-10-09T17:56:53Z)
Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision [84.31474052176343]
Recent AI-assistant agents, such as ChatGPT, rely on supervised fine-tuning (SFT) with human annotations and reinforcement learning from human feedback to align the output with human intentions. This dependence can significantly constrain the true potential of AI-assistant agents due to the high cost of obtaining human supervision. We propose a novel approach called SELF-ALIGN, which combines principle-driven reasoning and the generative power of LLMs for the self-alignment of AI agents with minimal human supervision.
arXiv Detail & Related papers (2023-05-04T17:59:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.