Related papers: Take Goodhart Seriously: Principled Limit on General-Purpose AI Optimization

Take Goodhart Seriously: Principled Limit on General-Purpose AI Optimization

URL: http://arxiv.org/abs/2510.02840v1
Date: Fri, 03 Oct 2025 09:25:12 GMT
Title: Take Goodhart Seriously: Principled Limit on General-Purpose AI Optimization
Authors: Antoine Maier, Aude Maier, Tom David,
Abstract summary: We argue that approximation, estimation, and optimization errors guarantee systematic deviations from the intended objective.<n>A principled limit on the optimization of General-Purpose AI systems is necessary.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: A common but rarely examined assumption in machine learning is that training yields models that actually satisfy their specified objective function. We call this the Objective Satisfaction Assumption (OSA). Although deviations from OSA are acknowledged, their implications are overlooked. We argue, in a learning-paradigm-agnostic framework, that OSA fails in realistic conditions: approximation, estimation, and optimization errors guarantee systematic deviations from the intended objective, regardless of the quality of its specification. Beyond these technical limitations, perfectly capturing and translating the developer's intent, such as alignment with human preferences, into a formal objective is practically impossible, making misspecification inevitable. Building on recent mathematical results, absent a mathematical characterization of these gaps, they are indistinguishable from those that collapse into Goodhart's law failure modes under strong optimization pressure. Because the Goodhart breaking point cannot be located ex ante, a principled limit on the optimization of General-Purpose AI systems is necessary. Absent such a limit, continued optimization is liable to push systems into predictable and irreversible loss of control.

Related papers

Agency and Architectural Limits: Why Optimization-Based Systems Cannot Be Norm-Responsive [0.0]
AI systems are increasingly deployed in high-stakes contexts under the assumption they can be governed by norms.<n>This paper demonstrates that the assumption is formally invalid for optimization-based systems.
arXiv Detail & Related papers (2026-02-26T17:16:17Z)
Goal-Oriented Influence-Maximizing Data Acquisition for Learning and Optimization [28.53710231018475]
We propose an active acquisition algorithm that avoids explicit posterior inference while remaining uncertainty-aware through inverse curvature.<n>GOIMDA selects inputs by maximizing their expected influence on a user-specified goal functional.<n>We show theoretically that, for generalized linear models, GOIMDA approximates predictive-entropy minimization up to a correction term accounting for goal alignment and prediction bias.
arXiv Detail & Related papers (2026-02-23T07:57:11Z)
From Data to Uncertainty Sets: a Machine Learning Approach [5.877778007271621]
We leverage robust optimization to protect a constraint against the uncertainty of a machine learning model's output.<n>We derive strong guarantees on the probability of violation.<n>On synthetic computational experiments, our method requires uncertainty sets with radii up to one order of magnitude smaller than those of other approaches.
arXiv Detail & Related papers (2025-03-04T01:30:28Z)
Uncertainty-Penalized Direct Preference Optimization [52.387088396044206]
We develop a pessimistic framework for DPO by introducing preference uncertainty penalization schemes. The penalization serves as a correction to the loss which attenuates the loss gradient for uncertain samples. We show improved overall performance compared to vanilla DPO, as well as better completions on prompts from high-uncertainty chosen/rejected responses.
arXiv Detail & Related papers (2024-10-26T14:24:37Z)
Expectation Alignment: Handling Reward Misspecification in the Presence of Expectation Mismatch [19.03141646688652]
We use the theory of mind, i.e., the human user's beliefs about the AI agent, as a basis to develop a formal explanatory framework. We propose a new interactive algorithm that uses the specified reward to infer potential user expectations.
arXiv Detail & Related papers (2024-04-12T19:43:37Z)
Decision-Focused Learning with Directional Gradients [1.2363103948638432]
We propose a novel family of decision-aware surrogate losses, called Perturbation Gradient (PG) losses, for the predict-then-optimize framework. Unlike the original decision loss which is typically piecewise constant and discontinuous, our new PG losses is a Lipschitz continuous, difference of concave functions. We provide numerical evidence confirming our PG losses substantively outperform existing proposals when the underlying model is misspecified.
arXiv Detail & Related papers (2024-02-05T18:14:28Z)
A Learning-Based Optimal Uncertainty Quantification Method and Its Application to Ballistic Impact Problems [1.713291434132985]
This paper concerns the optimal (supremum and infimum) uncertainty bounds for systems where the input (or prior) measure is only partially/imperfectly known. We demonstrate the learning based framework on the uncertainty optimization problem. We show that the approach can be used to construct maps for the performance certificate and safety in engineering practice.
arXiv Detail & Related papers (2022-12-28T14:30:53Z)
Beyond Value-Function Gaps: Improved Instance-Dependent Regret Bounds for Episodic Reinforcement Learning [50.44564503645015]
We provide improved gap-dependent regret bounds for reinforcement learning in finite episodic Markov decision processes. We prove tighter upper regret bounds for optimistic algorithms and accompany them with new information-theoretic lower bounds for a large class of MDPs.
arXiv Detail & Related papers (2021-07-02T20:36:05Z)
Uncertainty-aware Remaining Useful Life predictor [57.74855412811814]
Remaining Useful Life (RUL) estimation is the problem of inferring how long a certain industrial asset can be expected to operate. In this work, we consider Deep Gaussian Processes (DGPs) as possible solutions to the aforementioned limitations. The performance of the algorithms is evaluated on the N-CMAPSS dataset from NASA for aircraft engines.
arXiv Detail & Related papers (2021-04-08T08:50:44Z)
Improper Learning with Gradient-based Policy Optimization [62.50997487685586]
We consider an improper reinforcement learning setting where the learner is given M base controllers for an unknown Markov Decision Process. We propose a gradient-based approach that operates over a class of improper mixtures of the controllers.
arXiv Detail & Related papers (2021-02-16T14:53:55Z)
Offline Contextual Bandits with Overparameterized Models [52.788628474552276]
We ask whether the same phenomenon occurs for offline contextual bandits. We show that this discrepancy is due to the emphaction-stability of their objectives. In experiments with large neural networks, this gap between action-stable value-based objectives and unstable policy-based objectives leads to significant performance differences.
arXiv Detail & Related papers (2020-06-27T13:52:07Z)
Excursion Search for Constrained Bayesian Optimization under a Limited Budget of Failures [62.41541049302712]
We propose a novel decision maker grounded in control theory that controls the amount of risk we allow in the search as a function of a given budget of failures. Our algorithm uses the failures budget more efficiently in a variety of optimization experiments, and generally achieves lower regret, than state-of-the-art methods.
arXiv Detail & Related papers (2020-05-15T09:54:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.