Related papers: The Alignment Bottleneck

The Alignment Bottleneck

URL: http://arxiv.org/abs/2509.15932v1
Date: Fri, 19 Sep 2025 12:38:30 GMT
Title: The Alignment Bottleneck
Authors: Wenjun Cao,
Abstract summary: We model the loop as a two-stage cascade $U to H to Y$ given $S$, with cognitive capacity $C_textcog|S$ and average total capacity $barC_texttot|S$.<n>It pairs a data size-independent Fano lower bound proved on a separable codebook mixture with a PAC-Bayes upper bound whose KL term is controlled by the same channel via $m, barC_texttot|S$.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models improve with scale, yet feedback-based alignment still exhibits systematic deviations from intended behavior. Motivated by bounded rationality in economics and cognitive science, we view judgment as resource-limited and feedback as a constrained channel. On this basis, we model the loop as a two-stage cascade $U \to H \to Y$ given $S$, with cognitive capacity $C_{\text{cog}|S}$ and average total capacity $\bar{C}_{\text{tot}|S}$. Our main result is a capacity-coupled Alignment Performance Interval. It pairs a data size-independent Fano lower bound proved on a separable codebook mixture with a PAC-Bayes upper bound whose KL term is controlled by the same channel via $m \, \bar{C}_{\text{tot}|S}$. The PAC-Bayes bound becomes an upper bound on the same true risk when the canonical observable loss is used and the dataset is drawn from the same mixture. Under these matched conditions, both limits are governed by a single capacity. Consequences include that, with value complexity and capacity fixed, adding labels alone cannot cross the bound; attaining lower risk on more complex targets requires capacity that grows with $\log M$; and once useful signal saturates capacity, further optimization tends to fit channel regularities, consistent with reports of sycophancy and reward hacking. The analysis views alignment as interface engineering: measure and allocate limited capacity, manage task complexity, and decide where information is spent.

Related papers

JANUS: Structured Bidirectional Generation for Guaranteed Constraints and Analytical Uncertainty [0.0]
JANUS (Joint Ancestral Network for Uncertainty and Synthesis) is a framework that unifies capabilities using a DAG of Bayesian Decision Trees.<n>Key innovation is Reverse-Topological Back-filling, an algorithm that propagates constraints backwards through the causal graph.<n>Janus achieves state-of-the-art fidelity (Detection Score 0.497), eliminates mode collapse on imbalanced data, and provides exact handling of complex inter-column constraints.
arXiv Detail & Related papers (2026-03-04T05:36:11Z)
Leave-One-Out Prediction for General Hypothesis Classes [9.855978207725549]
We introduce Median of Level-Set Aggregation (MLSA), a general aggregation procedure based on empirical-risk level sets around the ERM.<n>For arbitrary fixed datasets and losses satisfying a mild monotonicity condition, we establish a multiplicative oracle inequality for the LOO error of the form [ LOO_S(hath) ;le; C cdot frac1n min_hin H L_S(h) ;+; fracComp(S,
arXiv Detail & Related papers (2026-03-02T16:27:44Z)
Stability and Generalization of Push-Sum Based Decentralized Optimization over Directed Graphs [55.77845440440496]
Push-based decentralized communication enables optimization over communication networks, where information exchange may be asymmetric.<n>We develop a unified uniform-stability framework for the Gradient Push (SGP) algorithm.<n>A key technical ingredient is an imbalance-aware generalization bound through two quantities.
arXiv Detail & Related papers (2026-02-24T05:32:03Z)
Intelligent Control of Collisional Architectures for Deterministic Multipartite State Engineering [0.0]
We introduce an intelligent, constraint-aware control framework for deterministic generation of symmetric Dicke states $|D_n(m)rangle$ in repeated excitation-within-interaction architectures.<n>The protocol employs partialSWAP collisions between two disjoint qubit registers, mediated by $m$ ancillary shuttle'' qubits, and poses Dickestate preparation as a emph-loop design problem.
arXiv Detail & Related papers (2026-02-09T11:15:32Z)
Phase Transition for Budgeted Multi-Agent Synergy [41.486076708302456]
Multi-agent systems can improve reliability, yet under a fixed inference budget they often help, saturate, or even collapse.<n>We develop a minimal and calibratable theory that predicts these regimes from three binding constraints of modern agent stacks.
arXiv Detail & Related papers (2026-01-24T05:32:50Z)
Unsupervised Conformal Inference: Bootstrapping and Alignment to Control LLM Uncertainty [49.19257648205146]
We propose an unsupervised conformal inference framework for generation.<n>Our gates achieve close-to-nominal coverage and provide tighter, more stable thresholds than split UCP.<n>The result is a label-free, API-compatible gate for test-time filtering.
arXiv Detail & Related papers (2025-09-26T23:40:47Z)
SIM-CoT: Supervised Implicit Chain-of-Thought [108.30049193668083]
Implicit Chain-of-Thought (CoT) methods offer a token-efficient alternative to explicit CoT reasoning in Large Language Models.<n>We identify a core latent instability issue when scaling the computational budget of implicit CoT.<n>We propose SIM-CoT, a plug-and-play training module that introduces step-level supervision to stabilize and enrich the latent reasoning space.
arXiv Detail & Related papers (2025-09-24T17:01:32Z)
A Fundamental Bound for Robust Quantum Gate Control [0.0]
We derive a universal performance limit for coherent quantum control in the presence of modeled and unmodeled uncertainties.<n>We prove that the worst-case (and hence the average) gate fidelity obeys the lower bound $F ge Flbbigl(tf Omeffbigr)$.
arXiv Detail & Related papers (2025-07-01T22:26:04Z)
Scheduling with Uncertain Holding Costs and its Application to Content Moderation [4.2130745016804205]
In content moderation for social media platforms, the cost of delaying the review of a content is proportional to its view trajectory, which fluctuates and is apriori unknown.<n>We consider a queueing model where job states evolve based on a Markov chain with state-dependent instantaneous holding costs.<n>By viewing each job as a Markovian ski-rental problem, we develop an index-based algorithm that adjusts to the opportunity of serving jobs in the future when uncertainty partly resolves.
arXiv Detail & Related papers (2025-05-27T15:26:24Z)
Near-Optimal Online Learning for Multi-Agent Submodular Coordination: Tight Approximation and Communication Efficiency [52.60557300927007]
We present a $textbfMA-OSMA$ algorithm to transfer the discrete submodular problem into a continuous optimization.<n>We also introduce a projection-free $textbfMA-OSEA$ algorithm, which effectively utilizes the KL divergence by mixing a uniform distribution.<n>Our algorithms significantly improve the $(frac11+c)$-approximation provided by the state-of-the-art OSG algorithm.
arXiv Detail & Related papers (2025-02-07T15:57:56Z)
Convergence Rate Analysis of LION [54.28350823319057]
LION converges iterations of $cal(sqrtdK-)$ measured by gradient Karush-Kuhn-T (sqrtdK-)$. We show that LION can achieve lower loss and higher performance compared to standard SGD.
arXiv Detail & Related papers (2024-11-12T11:30:53Z)
Settling the Sample Complexity of Online Reinforcement Learning [92.02082223856479]
We show how to achieve minimax-optimal regret without incurring any burn-in cost.<n>We extend our theory to unveil the influences of problem-dependent quantities like the optimal value/cost and certain variances.
arXiv Detail & Related papers (2023-07-25T15:42:11Z)
Scalable Primal-Dual Actor-Critic Method for Safe Multi-Agent RL with General Utilities [12.104551746465932]
We investigate safe multi-agent reinforcement learning, where agents seek to collectively maximize an aggregate sum of local objectives while satisfying their own safety constraints. Our algorithm converges to a first-order stationary point (FOSP) at the rate of $mathcalOleft(T-2/3right)$. In the sample-based setting, we demonstrate that, with high probability, our algorithm requires $widetildemathcalOleft(epsilon-3.5right)$ samples to achieve an $epsilon$-FOSP.
arXiv Detail & Related papers (2023-05-27T20:08:35Z)
Provably Efficient Model-Free Constrained RL with Linear Function Approximation [4.060731229044571]
We develop the first model-free, simulator-free algorithm that achieves a sublinear regret and a sublinear constraint violation even in large-scale systems. Our results are achieved via novel adaptations of the standard LSVI-UCB algorithms.
arXiv Detail & Related papers (2022-06-23T17:54:31Z)
Settling the Horizon-Dependence of Sample Complexity in Reinforcement Learning [82.31436758872715]
We develop an algorithm that achieves the same PAC guarantee while using only $O(1)$ episodes of environment interactions. We establish a connection between value functions in discounted and finite-horizon Markov decision processes.
arXiv Detail & Related papers (2021-11-01T00:21:24Z)
Provably Efficient Safe Exploration via Primal-Dual Policy Optimization [105.7510838453122]
We study the Safe Reinforcement Learning (SRL) problem using the Constrained Markov Decision Process (CMDP) formulation. We present an provably efficient online policy optimization algorithm for CMDP with safe exploration in the function approximation setting.
arXiv Detail & Related papers (2020-03-01T17:47:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.