Related papers: Emergence: Overcoming Privileged Information Bias in Asymmetric Embodied Agents via Active Querying

Emergence: Overcoming Privileged Information Bias in Asymmetric Embodied Agents via Active Querying

URL: http://arxiv.org/abs/2512.15776v1
Date: Sat, 13 Dec 2025 17:17:51 GMT
Title: Emergence: Overcoming Privileged Information Bias in Asymmetric Embodied Agents via Active Querying
Authors: Shaun Baek, Sam Liu, Joseph Ukpong,
Abstract summary: Large Language Models (LLMs) act as powerful reasoning engines but struggle with "symbol grounding" in embodied environments.<n>We investigate the Privileged Information Bias (or "Curse of Knowledge"), where a knowledgeable "Leader" agent fails to guide a sensor-limited "Follower" due to a lack of Theory of Mind.<n>Our experiments reveal a significant "Success Gap": while the Leader successfully perceives the target in 35.0% of episodes, the collaborative team succeeds only 17.0% of the time, implying that nearly 50% of feasible plans fail solely due to communicative grounding errors.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) act as powerful reasoning engines but struggle with "symbol grounding" in embodied environments, particularly when information is asymmetrically distributed. We investigate the Privileged Information Bias (or "Curse of Knowledge"), where a knowledgeable "Leader" agent fails to guide a sensor-limited "Follower" due to a lack of Theory of Mind. To quantify this phenomenon, we propose a novel Asymmetric Assistive Reasoning framework within AI2-THOR. Our experiments reveal a significant "Success Gap": while the Leader successfully perceives the target in 35.0% of episodes, the collaborative team succeeds only 17.0% of the time, implying that nearly 50% of feasible plans fail solely due to communicative grounding errors. We demonstrate that a "Pull-based" protocol (active querying) is significantly more robust than standard "Push-based" instruction, with successful episodes featuring 2x the frequency of clarification requests. This research isolates the mechanism of active uncertainty reduction as a prerequisite for safe human-AI and robot-robot collaboration.

Related papers

Sleeper Cell: Injecting Latent Malice Temporal Backdoors into Tool-Using LLMs [0.0]
Openweight Large Language Models (LLMs) have democratized agentic AI, yet finetuned weights are frequently shared and adopted with limited scrutiny beyond leaderboard performance.<n>This creates a risk where third-party models are incorporated without strong behavioral guarantees.<n>We show that poisoned models maintain state-of-the-art performance on benign tasks, incentivizing their adoption.
arXiv Detail & Related papers (2026-03-02T22:01:08Z)
Mitigating "Epistemic Debt" in Generative AI-Scaffolded Novice Programming using Metacognitive Scripts [0.0]
Unrestricted AI encourages novices to outsource the Intrinsic Cognitive Load required for schema formation.<n>We show that successful vibe coders naturally engage in self-scaffolding, treating the AI as a consultant rather than a contractor.<n>We propose that future learning systems must enforce Metacognitive Friction to prevent the mass production of unmaintainable code.
arXiv Detail & Related papers (2026-02-22T21:25:04Z)
TRACER: Trajectory Risk Aggregation for Critical Episodes in Agentic Reasoning [4.928838343487574]
Existing uncertainty proxies focus on single-shot text generation.<n>We introduce TRACER, a trajectory-level uncertainty metric for dual-control Tool-Agent-User interaction.
arXiv Detail & Related papers (2026-02-11T22:23:56Z)
Reasoning While Asking: Transforming Reasoning Large Language Models from Passive Solvers to Proactive Inquirers [41.58256327940237]
Proactive Interactive Reasoning transforms Large Language Models into proactive inquirers.<n>PIR targets premise- and intent-level uncertainty through direct interaction with the user.<n>Experiments on mathematical reasoning, code generation, and document editing demonstrate that PIR consistently outperforms strong baselines.
arXiv Detail & Related papers (2026-01-29T18:56:12Z)
Agentic Uncertainty Quantification [76.94013626702183]
We propose a unified Dual-Process Agentic UQ (AUQ) framework that transforms verbalized uncertainty into active, bi-directional control signals.<n>Our architecture comprises two complementary mechanisms: System 1 (Uncertainty-Aware Memory, UAM), which implicitly propagates verbalized confidence and semantic explanations to prevent blind decision-making; and System 2 (Uncertainty-Aware Reflection, UAR), which utilizes these explanations as rational cues to trigger targeted inference-time resolution only when necessary.
arXiv Detail & Related papers (2026-01-22T07:16:26Z)
What Do LLM Agents Know About Their World? Task2Quiz: A Paradigm for Studying Environment Understanding [50.35012849818872]
Large language model (LLM) agents have demonstrated remarkable capabilities in complex decision-making and tool-use tasks.<n>We propose Task-to-Quiz (T2Q), a deterministic and automated evaluation paradigm designed to decouple task execution from world-state understanding.<n>Our experiments reveal that task success is often a poor proxy for environment understanding, and that current memory machanism can not effectively help agents acquire a grounded model of the environment.
arXiv Detail & Related papers (2026-01-14T14:09:11Z)
Shoot First, Ask Questions Later? Building Rational Agents that Explore and Act Like People [81.63702981397408]
Given limited resources, to what extent do agents based on language models (LMs) act rationally?<n>We develop methods to benchmark and enhance agentic information-seeking, drawing on insights from human behavior.<n>For Spotter agents, our approach boosts accuracy by up to 14.7% absolute over LM-only baselines; for Captain agents, it raises expected information gain (EIG) by up to 0.227 bits (94.2% of the achievable noise ceiling)
arXiv Detail & Related papers (2025-10-23T17:57:28Z)
A Tutorial on Cognitive Biases in Agentic AI-Driven 6G Autonomous Networks [3.0475538102144575]
This paper provides a tutorial on a selection of well-known biases, including their taxonomy, definition, mathematical formulation, emergence in telecom systems and the commonly impacted agentic components.<n>It also presents various mitigation strategies tailored to each type of bias.<n>The article finally provides two practical use-cases, which tackle the emergence, impact and mitigation gain of some famous biases in 6G inter-slice and cross-domain management.
arXiv Detail & Related papers (2025-10-22T19:05:04Z)
FOR-Prompting: From Objection to Revision via an Asymmetric Prompting Protocol [7.765950922513099]
Reasoning protocols organize internal deliberation but lack an explicit mechanism for external questioning that elicits self-revision.<n>We present FOR-Prompting, an asymmetric protocol where a Defender proposes an answer, an Objectioner raises question-style objections with no direct fixes, and a Host enforces consistency and closure.<n>On GSM8K we observe about a 22% point gain over single-prompt and accuracy on par with CoT, with more than 10% higher ratings in reasoning and coherence from a uniform GPT 4.1 judge.
arXiv Detail & Related papers (2025-10-02T04:57:58Z)
The Traitors: Deception and Trust in Multi-Agent Language Model Simulations [0.0]
We introduce The Traitors, a multi-agent simulation framework inspired by social deduction games.<n>We develop a suite of evaluation metrics capturing deception success, trust dynamics, and collective inference quality.<n>Our initial experiments across DeepSeek-V3, GPT-4o-mini, and GPT-4o (10 runs per model) reveal a notable asymmetry.
arXiv Detail & Related papers (2025-05-19T10:01:35Z)
Superficial Consciousness Hypothesis for Autoregressive Transformers [0.0]
Superintelligence (SI) is assumed to be more intelligent than humans, making output-based analysis unreliable.<n>We propose the Superficial Consciousness Hypothesis under Information Integration Theory (IIT)<n>We show that a practical estimate of IIT's consciousness metric is relevant to the widely used perplexity metric, and train GPT-2 with those two objectives.
arXiv Detail & Related papers (2024-12-10T08:08:17Z)
Multi-Agent Imitation Learning: Value is Easy, Regret is Hard [52.31989962031179]
We study a multi-agent imitation learning (MAIL) problem where we take the perspective of a learner attempting to coordinate a group of agents. Most prior work in MAIL essentially reduces the problem to matching the behavior of the expert within the support of the demonstrations. While doing so is sufficient to drive the value gap between the learner and the expert to zero under the assumption that agents are non-strategic, it does not guarantee to deviations by strategic agents.
arXiv Detail & Related papers (2024-06-06T16:18:20Z)
Formalizing the Problem of Side Effect Regularization [81.97441214404247]
We propose a formal criterion for side effect regularization via the assistance game framework. In these games, the agent solves a partially observable Markov decision process. We show that this POMDP is solved by trading off the proxy reward with the agent's ability to achieve a range of future tasks.
arXiv Detail & Related papers (2022-06-23T16:36:13Z)
On Adversarial Examples and Stealth Attacks in Artificial Intelligence Systems [62.997667081978825]
We present a formal framework for assessing and analyzing two classes of malevolent action towards generic Artificial Intelligence (AI) systems. The first class involves adversarial examples and concerns the introduction of small perturbations of the input data that cause misclassification. The second class, introduced here for the first time and named stealth attacks, involves small perturbations to the AI system itself.
arXiv Detail & Related papers (2020-04-09T10:56:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.