From Literal to Liberal: A Meta-Prompting Framework for Eliciting Human-Aligned Exception Handling in Large Language Models
- URL: http://arxiv.org/abs/2510.12864v1
- Date: Tue, 14 Oct 2025 16:42:52 GMT
- Title: From Literal to Liberal: A Meta-Prompting Framework for Eliciting Human-Aligned Exception Handling in Large Language Models
- Authors: Imran Khan,
- Abstract summary: Large Language Models (LLMs) are increasingly being deployed as the reasoning engines for agentic AI systems.<n>They exhibit a critical flaw: a rigid adherence to explicit rules that leads to decisions misaligned with human common sense and intent.<n>We introduce the Rule-Intent Distinction (RID) Framework, which elicits human-aligned exception handling in LLMs in a zero-shot manner.
- Score: 0.3946915822335988
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Language Models (LLMs) are increasingly being deployed as the reasoning engines for agentic AI systems, yet they exhibit a critical flaw: a rigid adherence to explicit rules that leads to decisions misaligned with human common sense and intent. This "rule-rigidity" is a significant barrier to building trustworthy autonomous agents. While prior work has shown that supervised fine-tuning (SFT) with human explanations can mitigate this issue, SFT is computationally expensive and inaccessible to many practitioners. To address this gap, we introduce the Rule-Intent Distinction (RID) Framework, a novel, low-compute meta-prompting technique designed to elicit human-aligned exception handling in LLMs in a zero-shot manner. The RID framework provides the model with a structured cognitive schema for deconstructing tasks, classifying rules, weighing conflicting outcomes, and justifying its final decision. We evaluated the RID framework against baseline and Chain-of-Thought (CoT) prompting on a custom benchmark of 20 scenarios requiring nuanced judgment across diverse domains. Our human-verified results demonstrate that the RID framework significantly improves performance, achieving a 95% Human Alignment Score (HAS), compared to 80% for the baseline and 75% for CoT. Furthermore, it consistently produces higher-quality, intent-driven reasoning. This work presents a practical, accessible, and effective method for steering LLMs from literal instruction-following to liberal, goal-oriented reasoning, paving the way for more reliable and pragmatic AI agents.
Related papers
- How Controllable Are Large Language Models? A Unified Evaluation across Behavioral Granularities [75.10343190811592]
Large Language Models (LLMs) are increasingly deployed in socially sensitive domains.<n>Our benchmark offers a principled and interpretable framework for safe and controllable behavior.
arXiv Detail & Related papers (2026-03-03T03:50:13Z) - Maestro: Learning to Collaborate via Conditional Listwise Policy Optimization for Multi-Agent LLMs [23.590034731179824]
We present Through Role Orchestration (Maestro), a principled paradigm for collaboration that structurally decouples cognitive modes.<n>Maestro uses a collective of parallel Execution Agents for diverse exploration and a specialized Central Agent for convergent, evaluative synthesis.<n>Experiments on mathematical reasoning and general problem-solving benchmarks demonstrate that Maestro, coupled with CLPO, consistently outperforms existing state-of-the-art multi-agent approaches.
arXiv Detail & Related papers (2025-11-08T21:01:27Z) - From <Answer> to <Think>: Multidimensional Supervision of Reasoning Process for LLM Optimization [62.07990937720985]
Dimension-level Reward Model (DRM) is a new supervision framework for Large Language Models.<n>DRM evaluates the quality of a reasoning process along three fundamental, complementary, and interpretable dimensions.<n> Experimental results show that DRM provides effective supervision signals, guides the optimization of LLMs and enhances their reasoning ability.
arXiv Detail & Related papers (2025-10-13T14:29:15Z) - LLM-Guided Semantic Relational Reasoning for Multimodal Intent Recognition [14.683883775425821]
This paper proposes a novel method for understanding human intents from multimodal signals.<n>The method harnesses the expansive knowledge of large language models (LLMs) to establish semantic foundations.<n>Experiments on multimodal intent and dialogue act tasks demonstrate LGSRR's superiority over state-of-the-art methods.
arXiv Detail & Related papers (2025-09-01T10:18:47Z) - When Thinking LLMs Lie: Unveiling the Strategic Deception in Representations of Reasoning Models [9.05950721565821]
We study strategic deception in large language models (LLMs)<n>We induce, detect, and control such deception in CoT-enabled LLMs.<n>We achieve a 40% success rate in eliciting context-appropriate deception without explicit prompts.
arXiv Detail & Related papers (2025-06-05T11:44:19Z) - Teaching AI to Handle Exceptions: Supervised Fine-Tuning with Human-Aligned Judgment [0.0]
Large language models (LLMs) are evolving into agentic AI systems, but their decision-making processes remain poorly understood.<n>We show that even LLMs that excel at reasoning deviate significantly from human judgments because they adhere strictly to policies.<n>We then evaluate three approaches to tuning AI agents to handle exceptions: ethical framework prompting, chain-of-thought reasoning, and supervised fine-tuning.
arXiv Detail & Related papers (2025-03-04T20:00:37Z) - Proof of Thought : Neurosymbolic Program Synthesis allows Robust and Interpretable Reasoning [1.3003982724617653]
Large Language Models (LLMs) have revolutionized natural language processing, yet they struggle with inconsistent reasoning.
This research introduces Proof of Thought, a framework that enhances the reliability and transparency of LLM outputs.
Key contributions include a robust type system with sort management for enhanced logical integrity, explicit representation of rules for clear distinction between factual and inferential knowledge.
arXiv Detail & Related papers (2024-09-25T18:35:45Z) - Tuning-Free Accountable Intervention for LLM Deployment -- A
Metacognitive Approach [55.613461060997004]
Large Language Models (LLMs) have catalyzed transformative advances across a spectrum of natural language processing tasks.
We propose an innovative textitmetacognitive approach, dubbed textbfCLEAR, to equip LLMs with capabilities for self-aware error identification and correction.
arXiv Detail & Related papers (2024-03-08T19:18:53Z) - DECIDER: A Dual-System Rule-Controllable Decoding Framework for Language Generation [57.07295906718989]
Constrained decoding approaches aim to control the meaning or style of text generated by pre-trained large language (Ms also PLMs) for various tasks at inference time.<n>These methods often guide plausible continuations by greedily and explicitly selecting targets.<n>Inspired by cognitive dual-process theory, we propose a novel decoding framework DECIDER.
arXiv Detail & Related papers (2024-03-04T11:49:08Z) - DeAL: Decoding-time Alignment for Large Language Models [58.368979253590794]
Large Language Models (LLMs) are nowadays expected to generate content aligned with human preferences.<n>We propose DeAL, a framework that allows the user to customize reward functions and enables Decoding-time alignment of LLMs.
arXiv Detail & Related papers (2024-02-05T06:12:29Z) - SALMON: Self-Alignment with Instructable Reward Models [80.83323636730341]
This paper presents a novel approach, namely SALMON, to align base language models with minimal human supervision.
We develop an AI assistant named Dromedary-2 with only 6 exemplars for in-context learning and 31 human-defined principles.
arXiv Detail & Related papers (2023-10-09T17:56:53Z) - Principle-Driven Self-Alignment of Language Models from Scratch with
Minimal Human Supervision [84.31474052176343]
Recent AI-assistant agents, such as ChatGPT, rely on supervised fine-tuning (SFT) with human annotations and reinforcement learning from human feedback to align the output with human intentions.
This dependence can significantly constrain the true potential of AI-assistant agents due to the high cost of obtaining human supervision.
We propose a novel approach called SELF-ALIGN, which combines principle-driven reasoning and the generative power of LLMs for the self-alignment of AI agents with minimal human supervision.
arXiv Detail & Related papers (2023-05-04T17:59:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.