Related papers: Confidence is Not Competence

Confidence is Not Competence

URL: http://arxiv.org/abs/2510.24772v1
Date: Fri, 24 Oct 2025 17:22:48 GMT
Title: Confidence is Not Competence
Authors: Debdeep Sanyal, Manya Pandey, Dhruv Kumar, Saurabh Deshpande, Murari Mandal,
Abstract summary: We analyze the geometry of internal states across two phases - pre-generative assessment and solution execution.<n>A sharp reduction in geometric complexity from thought to action mechanistically explains the confidence-competence gap.
Score: 7.094715131203088
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Large language models (LLMs) often exhibit a puzzling disconnect between their asserted confidence and actual problem-solving competence. We offer a mechanistic account of this decoupling by analyzing the geometry of internal states across two phases - pre-generative assessment and solution execution. A simple linear probe decodes the internal "solvability belief" of a model, revealing a well-ordered belief axis that generalizes across model families and across math, code, planning, and logic tasks. Yet, the geometries diverge - although belief is linearly decodable, the assessment manifold has high linear effective dimensionality as measured from the principal components, while the subsequent reasoning trace evolves on a much lower-dimensional manifold. This sharp reduction in geometric complexity from thought to action mechanistically explains the confidence-competence gap. Causal interventions that steer representations along the belief axis leave final solutions unchanged, indicating that linear nudges in the complex assessment space do not control the constrained dynamics of execution. We thus uncover a two-system architecture - a geometrically complex assessor feeding a geometrically simple executor. These results challenge the assumption that decodable beliefs are actionable levers, instead arguing for interventions that target the procedural dynamics of execution rather than the high-level geometry of assessment.

Related papers

How RL Unlocks the Aha Moment in Geometric Interleaved Reasoning [17.18771466838129]
Solving complex geometric problems inherently requires interleaved reasoning.<n>We argue thatSupervised Fine-Tuning (SFT) on interleaved plot-solution data leads to a substantial degradation in reasoning performance.<n>We propose Faire, a reinforcement learning framework that enforces three casual constraints to move beyond superficial imitation.
arXiv Detail & Related papers (2026-03-01T12:18:12Z)
Rectifying Geometry-Induced Similarity Distortions for Real-World Aerial-Ground Person Re-Identification [4.039576422478934]
Aerial-ground person re-identification (AG-ReID) is fundamentally challenged by extreme viewpoint and distance discrepancies.<n>Existing methods rely on geometry-aware feature learning or appearance-conditioned prompting.<n>We introduce Geometry-Induced Query-Key Transformation (GIQT), a lightweight low-rank module that rectifies the similarity space by conditioning query-key interactions on camera geometry.
arXiv Detail & Related papers (2026-01-29T08:41:42Z)
Scale-Consistent State-Space Dynamics via Fractal of Stationary Transformations [9.983526161001997]
Recent deep learning models increasingly rely on depth without structural guarantees on the validity of intermediate representations.<n>We address this limitation by formulating a structural requirement for state-space model's scale-consistent latent dynamics.<n>We empirically verify the predicted scale-consistent behavior, showing that adaptive efficiency emerges from the aligned latent geometry.
arXiv Detail & Related papers (2026-01-27T12:44:20Z)
TangramPuzzle: Evaluating Multimodal Large Language Models with Compositional Spatial Reasoning [104.66714520975837]
We introduce a geometry-grounded benchmark designed to evaluate compositional spatial reasoning through the lens of the classic Tangram game.<n>We propose the Tangram Construction Expression (TCE), a symbolic geometric framework that grounds tangram assemblies in exact, machine-verifiable coordinate specifications.<n>We conduct extensive evaluation experiments on advanced open-source and proprietary models, revealing an interesting insight: MLLMs tend to prioritize matching the target silhouette while neglecting geometric constraints.
arXiv Detail & Related papers (2026-01-23T07:35:05Z)
Geometric and Dynamic Scaling in Deep Transformers [13.697614668609205]
We argue that the collapse of deep Transformers is fundamentally a geometric problem.<n>We propose a unified geometric framework that addresses these failures through two principles.<n>Our analysis predicts that enforcing geometric validity while allowing dynamic erasure is essential for avoiding rank collapse in ultra-deep networks.
arXiv Detail & Related papers (2026-01-03T00:41:46Z)
GeoBench: Rethinking Multimodal Geometric Problem-Solving via Hierarchical Evaluation [48.04396968707237]
We present GeoBench, a hierarchical benchmark featuring four reasoning levels in geometric problem-solving.<n>We systematically assess capabilities ranging from attribute extraction to logical error correction.<n>These findings establish GeoBench as a comprehensive benchmark while offering actionable guidelines for developing geometric problem-solving systems.
arXiv Detail & Related papers (2025-12-30T09:56:37Z)
Geometrically-Constrained Agent for Spatial Reasoning [53.93718394870856]
Vision Language Models exhibit a fundamental semantic-to-geometric gap in spatial reasoning.<n>Current paradigms fail to bridge this gap.<n>We propose a training-free agentic paradigm that resolves this gap by introducing a formal task constraint.
arXiv Detail & Related papers (2025-11-27T17:50:37Z)
RealUnify: Do Unified Models Truly Benefit from Unification? A Comprehensive Benchmark [71.3555284685426]
We introduce RealUnify, a benchmark designed to evaluate bidirectional capability synergy.<n>RealUnify comprises 1,000 meticulously human-annotated instances spanning 10 categories and 32 subtasks.<n>We find that current unified models still struggle to achieve effective synergy, indicating that architectural unification alone is insufficient.
arXiv Detail & Related papers (2025-09-29T15:07:28Z)
Emergence of Superposition: Unveiling the Training Dynamics of Chain of Continuous Thought [64.43689151961054]
We theoretically analyze the training dynamics of a simplified two-layer transformer on the directed graph reachability problem.<n>Our analysis reveals that during training using continuous thought, the index-matching logit will first increase and then remain bounded under mild assumptions.
arXiv Detail & Related papers (2025-09-27T15:23:46Z)
REMA: A Unified Reasoning Manifold Framework for Interpreting Large Language Model [29.40036398095681]
We define the Reasoning Manifold, a latent low-dimensional geometric structure formed by the internal representations corresponding to all correctly reasoned generations.<n>We build REMA, a framework that explains the origins of failures by quantitatively comparing the spatial relationships of internal model representations corresponding to both erroneous and correct reasoning samples.<n>Our experiments on diverse language and multimodal models and tasks demonstrate the low-dimensional nature of the reasoning manifold and the high separability between erroneous and correct reasoning representations.
arXiv Detail & Related papers (2025-09-26T16:02:27Z)
Information-Theoretic Bounds and Task-Centric Learning Complexity for Real-World Dynamic Nonlinear Systems [0.6875312133832079]
Dynamic nonlinear systems exhibit distortions arising from coupled static and dynamic effects.<n>This paper presents a theoretical framework grounded in structured decomposition, variance analysis, and task-centric complexity bounds.
arXiv Detail & Related papers (2025-09-08T12:08:02Z)
A Comment On "The Illusion of Thinking": Reframing the Reasoning Cliff as an Agentic Gap [0.39073867995073247]
We argue that the observed failure is not evidence of a fundamental cognitive boundary, but rather a predictable outcome of system-level constraints.<n>A model, initially declaring a puzzle impossible when confined to text-only generation, now employs agentic tools to not only solve it but also master variations of complexity far beyond the reasoning cliff it previously failed to surmount.
arXiv Detail & Related papers (2025-06-23T17:14:21Z)
Decoder ensembling for learned latent geometries [15.484595752241122]
We show how to easily compute geodesics on the associated expected manifold. We find this simple and reliable, thereby coming one step closer to easy-to-use latent geometries.
arXiv Detail & Related papers (2024-08-14T12:35:41Z)
Understanding Augmentation-based Self-Supervised Representation Learning via RKHS Approximation and Regression [53.15502562048627]
Recent work has built the connection between self-supervised learning and the approximation of the top eigenspace of a graph Laplacian operator. This work delves into a statistical analysis of augmentation-based pretraining.
arXiv Detail & Related papers (2023-06-01T15:18:55Z)
Linear Adversarial Concept Erasure [108.37226654006153]
We formulate the problem of identifying and erasing a linear subspace that corresponds to a given concept.<n>We show that the method is highly expressive, effectively mitigating bias in deep nonlinear classifiers while maintaining tractability and interpretability.
arXiv Detail & Related papers (2022-01-28T13:00:17Z)
Manifold Learning via Manifold Deflation [105.7418091051558]
dimensionality reduction methods provide a valuable means to visualize and interpret high-dimensional data. Many popular methods can fail dramatically, even on simple two-dimensional Manifolds. This paper presents an embedding method for a novel, incremental tangent space estimator that incorporates global structure as coordinates. Empirically, we show our algorithm recovers novel and interesting embeddings on real-world and synthetic datasets.
arXiv Detail & Related papers (2020-07-07T10:04:28Z)
On dissipative symplectic integration with applications to gradient-based optimization [77.34726150561087]
We propose a geometric framework in which discretizations can be realized systematically. We show that a generalization of symplectic to nonconservative and in particular dissipative Hamiltonian systems is able to preserve rates of convergence up to a controlled error.
arXiv Detail & Related papers (2020-04-15T00:36:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.