Utility-Learning Tension in Self-Modifying Agents
- URL: http://arxiv.org/abs/2510.04399v1
- Date: Sun, 05 Oct 2025 23:52:16 GMT
- Title: Utility-Learning Tension in Self-Modifying Agents
- Authors: Charles L. Wang, Keir Dorchen, Peter Jin,
- Abstract summary: We show that utility-driven changes that improve immediate or expected performance can erode statistical preconditions for reliable learning and generalization.<n>Our findings show that distribution-free guarantees are preserved iff the policy-reachable model family is uniformly capacity-bounded.<n>Under standard assumptions common in practice, these axes reduce to the same capacity criterion, yielding a single boundary for safe self-modification.
- Score: 0.12744523252873352
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As systems trend toward superintelligence, a natural modeling premise is that agents can self-improve along every facet of their own design. We formalize this with a five-axis decomposition and a decision layer, separating incentives from learning behavior and analyzing axes in isolation. Our central result identifies and introduces a sharp utility--learning tension, the structural conflict in self-modifying systems whereby utility-driven changes that improve immediate or expected performance can also erode the statistical preconditions for reliable learning and generalization. Our findings show that distribution-free guarantees are preserved iff the policy-reachable model family is uniformly capacity-bounded; when capacity can grow without limit, utility-rational self-changes can render learnable tasks unlearnable. Under standard assumptions common in practice, these axes reduce to the same capacity criterion, yielding a single boundary for safe self-modification. Numerical experiments across several axes validate the theory by comparing destructive utility policies against our proposed two-gate policies that preserve learnability.
Related papers
- VI-CuRL: Stabilizing Verifier-Independent RL Reasoning via Confidence-Guided Variance Reduction [55.04308051033549]
Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a dominant paradigm for enhancing Large Language Models (LLMs) reasoning.<n>We introduceVerifier-Independent Curriculum Reinforcement Learning (VI-CuRL), a framework that leverages the model's intrinsic confidence to construct a curriculum independent from external verifiers.
arXiv Detail & Related papers (2026-02-13T03:40:52Z) - Coherence Mechanisms for Provable Self-Improvement [38.3455527898461]
We propose a principled framework for self-improvement based on the concept of emphcoherence<n>We formalize this concept using projection-based mechanisms that update a baseline model to be coherent while remaining as close as possible to its original behavior.<n>Our analysis is comprehensive, covering both emphdirect and emphtwo-step projection methods, and robustly extends these guarantees to non-realizable settings.
arXiv Detail & Related papers (2025-11-11T16:45:14Z) - Gaming and Cooperation in Federated Learning: What Can Happen and How to Monitor It [0.0]
We present an analytical framework that makes it possible to clearly identify where behaviors that genuinely improve performance diverge from those that merely target metrics.<n>We introduce two indices that respectively quantify behavioral incentives and collective performance loss.<n>We provide both a practical algorithm for allocating limited audit resources and a performance guarantee.
arXiv Detail & Related papers (2025-09-02T14:55:01Z) - Can Large Reasoning Models Self-Train? [58.953117118687096]
Scaling the performance of large language models increasingly depends on methods that reduce reliance on human supervision.<n>We propose an online self-training reinforcement learning algorithm that leverages the model's self-consistency to infer correctness signals and train without any ground-truth supervision.
arXiv Detail & Related papers (2025-05-27T17:16:00Z) - SAMBO-RL: Shifts-aware Model-based Offline Reinforcement Learning [9.88109749688605]
Model-based offline reinforcement learning trains policies using pre-collected datasets and learned environment models.<n>This paper offers a comprehensive analysis that disentangles the problem into two fundamental components: model bias and policy shift.<n>We introduce Shifts-aware Model-based Offline Reinforcement Learning (SAMBO-RL), a practical framework that efficiently trains classifiers to approximate SAR for policy optimization.
arXiv Detail & Related papers (2024-08-23T04:25:09Z) - Conformal Policy Learning for Sensorimotor Control Under Distribution
Shifts [61.929388479847525]
This paper focuses on the problem of detecting and reacting to changes in the distribution of a sensorimotor controller's observables.
The key idea is the design of switching policies that can take conformal quantiles as input.
We show how to design such policies by using conformal quantiles to switch between base policies with different characteristics.
arXiv Detail & Related papers (2023-11-02T17:59:30Z) - Modeling Uncertain Feature Representation for Domain Generalization [49.129544670700525]
We show that our method consistently improves the network generalization ability on multiple vision tasks.
Our methods are simple yet effective and can be readily integrated into networks without additional trainable parameters or loss constraints.
arXiv Detail & Related papers (2023-01-16T14:25:02Z) - A Regularized Implicit Policy for Offline Reinforcement Learning [54.7427227775581]
offline reinforcement learning enables learning from a fixed dataset, without further interactions with the environment.
We propose a framework that supports learning a flexible yet well-regularized fully-implicit policy.
Experiments and ablation study on the D4RL dataset validate our framework and the effectiveness of our algorithmic designs.
arXiv Detail & Related papers (2022-02-19T20:22:04Z) - Trust but Verify: Assigning Prediction Credibility by Counterfactual
Constrained Learning [123.3472310767721]
Prediction credibility measures are fundamental in statistics and machine learning.
These measures should account for the wide variety of models used in practice.
The framework developed in this work expresses the credibility as a risk-fit trade-off.
arXiv Detail & Related papers (2020-11-24T19:52:38Z) - Learning Robust Models Using The Principle of Independent Causal
Mechanisms [26.79262903241044]
We propose a new gradient-based learning framework whose objective function is derived from the ICM principle.
We show theoretically and experimentally that neural networks trained in this framework focus on relations remaining invariant across environments.
arXiv Detail & Related papers (2020-10-14T15:38:01Z) - Efficient Empowerment Estimation for Unsupervised Stabilization [75.32013242448151]
empowerment principle enables unsupervised stabilization of dynamical systems at upright positions.
We propose an alternative solution based on a trainable representation of a dynamical system as a Gaussian channel.
We show that our method has a lower sample complexity, is more stable in training, possesses the essential properties of the empowerment function, and allows estimation of empowerment from images.
arXiv Detail & Related papers (2020-07-14T21:10:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.