Related papers: Coherence Mechanisms for Provable Self-Improvement

Coherence Mechanisms for Provable Self-Improvement

URL: http://arxiv.org/abs/2511.08440v1
Date: Wed, 12 Nov 2025 01:58:52 GMT
Title: Coherence Mechanisms for Provable Self-Improvement
Authors: Mehryar Mohri, Jon Schneider, Yifan Wu,
Abstract summary: We propose a principled framework for self-improvement based on the concept of emphcoherence<n>We formalize this concept using projection-based mechanisms that update a baseline model to be coherent while remaining as close as possible to its original behavior.<n>Our analysis is comprehensive, covering both emphdirect and emphtwo-step projection methods, and robustly extends these guarantees to non-realizable settings.
Score: 38.3455527898461
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Self-improvement is a critical capability for large language models and other intelligent systems, enabling them to refine their behavior and internal consistency without external supervision. Despite its importance, prior approaches largely rely on empirical heuristics and lack formal guarantees. In this paper, we propose a principled framework for self-improvement based on the concept of \emph{coherence}, which requires that a model's outputs remain consistent under task-preserving transformations of the input. We formalize this concept using projection-based mechanisms that update a baseline model to be coherent while remaining as close as possible to its original behavior. We provide rigorous theoretical guarantees that these mechanisms achieve \emph{monotonic improvement}, measured by a reduction in expected Bregman divergence. Our analysis is comprehensive, covering both \emph{direct} and \emph{two-step} projection methods, and robustly extends these guarantees to non-realizable settings, empirical (finite-sample) distributions, and relaxed coherence constraints. Furthermore, we establish a general \emph{characterization theorem}, showing that any mechanism with similar provable improvement guarantees must inherently conform to a coherence-based structure. This culminates in rigidity results under the demand for universal improvement, establishing coherence as a fundamental and, in a formal sense, necessary principle for provable self-improvement.

Related papers

Why Self-Rewarding Works: Theoretical Guarantees for Iterative Alignment of Language Models [50.248686344277246]
Self-Rewarding Language Models (SRLMs) achieve notable success in iteratively improving alignment without external feedback.<n>This paper provides the first rigorous theoretical guarantees for SRLMs.
arXiv Detail & Related papers (2026-01-30T03:45:43Z)
Beyond Predictive Uncertainty: Reliable Representation Learning with Structural Constraints [0.3948325938742681]
We argue that reliability should be regarded as a first-class property of learned representations themselves.<n>We propose a principled framework for reliable representation learning that explicitly models representation-level uncertainty.<n>Our approach introduces uncertainty-aware regularization directly in the representation space, encouraging representations that are not only predictive but also stable, well-calibrated, and robust to noise and structural perturbations.
arXiv Detail & Related papers (2026-01-22T18:19:52Z)
Multi-agent Adaptive Mechanism Design [13.027684227860322]
We introduce Distributionally Robust Adaptive Mechanism (DRAM), a general framework combining insights from both mechanism design and online learning.<n>Our mechanism guarantees truthful reporting with high probability while achieving $tildeO(sqrtT)$ optimal cumulative regret.
arXiv Detail & Related papers (2025-12-25T21:59:51Z)
The Causal Round Trip: Generating Authentic Counterfactuals by Eliminating Information Loss [4.166536642958902]
We introduce BELM-MDCM, the first diffusion-based framework engineered to be causally sound by eliminating the Structural Reconstruction Error (SRE)<n>Our work reconciles the power of modern generative models with the rigor of classical causal theory.
arXiv Detail & Related papers (2025-11-07T13:37:23Z)
Utility-Learning Tension in Self-Modifying Agents [0.12744523252873352]
We show that utility-driven changes that improve immediate or expected performance can erode statistical preconditions for reliable learning and generalization.<n>Our findings show that distribution-free guarantees are preserved iff the policy-reachable model family is uniformly capacity-bounded.<n>Under standard assumptions common in practice, these axes reduce to the same capacity criterion, yielding a single boundary for safe self-modification.
arXiv Detail & Related papers (2025-10-05T23:52:16Z)
ERIS: An Energy-Guided Feature Disentanglement Framework for Out-of-Distribution Time Series Classification [51.07970070817353]
An ideal time series classification (TSC) should be able to capture invariant representations.<n>Current methods are largely unguided, lacking the semantic direction required to isolate truly universal features.<n>We propose an end-to-end Energy-Regularized Information for Shift-Robustness framework to enable guided and reliable feature disentanglement.
arXiv Detail & Related papers (2025-08-19T12:13:41Z)
Toward a Graph-Theoretic Model of Belief: Confidence, Credibility, and Structural Coherence [0.0]
This paper introduces a minimal formalism for belief systems as directed, weighted graphs.<n>Unlike logical and argumentation-based frameworks, it supports fine-grained structural representation without committing to binary justification status or deductive closure.<n>Its aim is to provide a foundational substrate for analyzing the internal organization of belief systems.
arXiv Detail & Related papers (2025-08-05T14:03:23Z)
Function-coherent gambles [0.0]
This paper introduces function-coherent gambles, a generalization that accommodates non-linear utility.<n>We prove a representation theorem that characterizes acceptable gambles through continuous linear functionals.<n>We demonstrate how these alternatives to constant-rate exponential discounting can be integrated within the function-coherent framework.
arXiv Detail & Related papers (2025-02-22T14:44:54Z)
Strategyproof and Proportionally Fair Facility Location [77.16035689756859]
We focus on a simple, one-dimensional collective decision problem (often referred to as the facility location problem) We analyze a hierarchy of proportionality-based fairness axioms of varying strength. For each axiom, we characterize the family of mechanisms that satisfy the axiom and strategyproofness.
arXiv Detail & Related papers (2021-11-02T12:41:32Z)
Towards a Theoretical Understanding of the Robustness of Variational Autoencoders [82.68133908421792]
We make inroads into understanding the robustness of Variational Autoencoders (VAEs) to adversarial attacks and other input perturbations. We develop a novel criterion for robustness in probabilistic models: $r$-robustness. We show that VAEs trained using disentangling methods score well under our robustness metrics.
arXiv Detail & Related papers (2020-07-14T21:22:29Z)
A general framework for defining and optimizing robustness [74.67016173858497]
We propose a rigorous and flexible framework for defining different types of robustness properties for classifiers. Our concept is based on postulates that robustness of a classifier should be considered as a property that is independent of accuracy. We develop a very general robustness framework that is applicable to any type of classification model.
arXiv Detail & Related papers (2020-06-19T13:24:20Z)
Target-Embedding Autoencoders for Supervised Representation Learning [111.07204912245841]
This paper analyzes a framework for improving generalization in a purely supervised setting, where the target space is high-dimensional. We motivate and formalize the general framework of target-embedding autoencoders (TEA) for supervised prediction, learning intermediate latent representations jointly optimized to be both predictable from features as well as predictive of targets.
arXiv Detail & Related papers (2020-01-23T02:37:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.