Control when confidence is costly
- URL: http://arxiv.org/abs/2406.14427v2
- Date: Tue, 29 Oct 2024 18:52:56 GMT
- Title: Control when confidence is costly
- Authors: Itzel Olivos-Castillo, Paul Schrater, Xaq Pitkow,
- Abstract summary: We develop a version of control that accounts for computational costs of inference.
We study Linear Quadratic Gaussian (LQG) control with an added internal cost on the relative precision of the posterior probability over the world state.
We discover that the rational strategy that solves the joint inference and control problem goes through phase transitions depending on the task demands.
- Score: 4.683612295430957
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We develop a version of stochastic control that accounts for computational costs of inference. Past studies identified efficient coding without control, or efficient control that neglects the cost of synthesizing information. Here we combine these concepts into a framework where agents rationally approximate inference for efficient control. Specifically, we study Linear Quadratic Gaussian (LQG) control with an added internal cost on the relative precision of the posterior probability over the world state. This creates a trade-off: an agent can obtain more utility overall by sacrificing some task performance, if doing so saves enough bits during inference. We discover that the rational strategy that solves the joint inference and control problem goes through phase transitions depending on the task demands, switching from a costly but optimal inference to a family of suboptimal inferences related by rotation transformations, each misestimate the stability of the world. In all cases, the agent moves more to think less. This work provides a foundation for a new type of rational computations that could be used by both brains and machines for efficient but computationally constrained control.
Related papers
- Decoding Rewards in Competitive Games: Inverse Game Theory with Entropy Regularization [52.74762030521324]
We propose a novel algorithm to learn reward functions from observed actions.<n>We provide strong theoretical guarantees for the reliability and sample efficiency of our algorithm.
arXiv Detail & Related papers (2026-01-19T04:12:51Z) - AI Progress Should Be Measured by Capability-Per-Resource, Not Scale Alone: A Framework for Gradient-Guided Resource Allocation in LLMs [7.850805629833066]
We argue that AI development should be fundamentally reoriented toward capability-per-resource rather than capability alone.<n>We present a theoretical framework demonstrating that resource-allocation decisions guided by gradient influence patterns can dramatically improve efficiency throughout the AI lifecycle.
arXiv Detail & Related papers (2025-11-02T20:59:51Z) - MORSE: Multi-Objective Reinforcement Learning via Strategy Evolution for Supply Chain Optimization [0.0]
In supply chain management, decision-making often involves balancing multiple objectives, such as cost reduction, service level improvement, and environmental sustainability.<n>Traditional multi-objective optimization methods, such as linear programming and evolutionary algorithms, struggle to adapt in real-time to the dynamic nature of supply chains.<n>We propose an approach that combines Reinforcement Learning (RL) and Multi-Objective Evolutionary Algorithms (MOEAs) to address these challenges for dynamic multi-objective optimization under uncertainty.
arXiv Detail & Related papers (2025-09-08T09:51:24Z) - Cost-aware Stopping for Bayesian Optimization [53.34052774820105]
We propose a cost-aware stopping rule for Bayesian optimization that adapts to varying evaluation costs and is free of tuning.<n>We prove a theoretical guarantee bounding the expected cumulative evaluation cost incurred by our stopping rule when paired with state-of-the-art acquisition functions.
arXiv Detail & Related papers (2025-07-16T17:54:14Z) - Towards Principled Unsupervised Multi-Agent Reinforcement Learning [49.533774397707056]
We present a scalable, decentralized, trust-region policy search algorithm to address the problem in practical settings.<n>We show that optimizing for a specific objective, namely mixture entropy, provides an excellent trade-off between tractability and performances.
arXiv Detail & Related papers (2025-02-12T12:51:36Z) - Communication Efficient Decentralization for Smoothed Online Convex Optimization [9.449153668916098]
We study the multi-agent Smoothed Online Convex Optimization (SOCO) problem, where $N$ agents interact through a communication graph.
In each round, each agent $i$ receives a strongly convex hitting cost function $fi_t$ in an online fashion.
Our results hold even when the communication graph changes arbitrarily and adaptively over time.
arXiv Detail & Related papers (2024-11-13T05:59:04Z) - Provably Efficient Exploration in Inverse Constrained Reinforcement Learning [12.178081346315523]
Inverse Constrained Reinforcement Learning is a common solver for recovering feasible constraints in complex environments.<n>We propose a strategic exploration framework for sampling with guaranteed efficiency to bridge this gap.<n>We introduce two exploratory algorithms to achieve efficient constraint inference.
arXiv Detail & Related papers (2024-09-24T10:48:13Z) - Bridging Domains with Approximately Shared Features [26.096779584142986]
Multi-source domain adaptation aims to reduce performance degradation when applying machine learning models to unseen domains.
Some advocate for learning invariant features from source domains, while others favor more diverse features.
We propose a statistical framework that distinguishes the utilities of features based on the variance of their correlation to label $y$ across domains.
arXiv Detail & Related papers (2024-03-11T04:25:41Z) - REBEL: Reward Regularization-Based Approach for Robotic Reinforcement Learning from Human Feedback [61.54791065013767]
A misalignment between the reward function and human preferences can lead to catastrophic outcomes in the real world.
Recent methods aim to mitigate misalignment by learning reward functions from human preferences.
We propose a novel concept of reward regularization within the robotic RLHF framework.
arXiv Detail & Related papers (2023-12-22T04:56:37Z) - Stochastic Optimal Control Matching [53.156277491861985]
Our work introduces Optimal Control Matching (SOCM), a novel Iterative Diffusion Optimization (IDO) technique for optimal control.
The control is learned via a least squares problem by trying to fit a matching vector field.
Experimentally, our algorithm achieves lower error than all the existing IDO techniques for optimal control.
arXiv Detail & Related papers (2023-12-04T16:49:43Z) - Quantization for decentralized learning under subspace constraints [61.59416703323886]
We consider decentralized optimization problems where agents have individual cost functions to minimize subject to subspace constraints.
We propose and study an adaptive decentralized strategy where the agents employ differential randomized quantizers to compress their estimates.
The analysis shows that, under some general conditions on the quantization noise, the strategy is stable both in terms of mean-square error and average bit rate.
arXiv Detail & Related papers (2022-09-16T09:38:38Z) - Should All Proposals be Treated Equally in Object Detection? [110.27485090952385]
The complexity-precision trade-off of an object detector is a critical problem for resource constrained vision tasks.
It is hypothesized that improved detection efficiency requires a paradigm shift, towards the unequal processing of proposals.
This results in better utilization of available computational budget, enabling higher accuracy for the same FLOPS.
arXiv Detail & Related papers (2022-07-07T18:26:32Z) - Planning with Dynamically Estimated Action Costs [2.8326418377665346]
Information about action costs is critical for real-world AI planning applications.
Recent approaches use black-box external action cost estimators, often learned from data, that are applied during the planning phase.
We suggest a generalization of deterministic planning with action costs that allows selecting between multiple estimators for action cost.
arXiv Detail & Related papers (2022-06-08T21:10:37Z) - Efficient Online Linear Control with Stochastic Convex Costs and Unknown
Dynamics [0.0]
We present a computationally efficient algorithm that attains an optimal $sqrtT$ regret-rate against the best stabilizing linear controller.
In contrast to previous work, our algorithm is based on the Optimism in the Face of Uncertainty paradigm.
arXiv Detail & Related papers (2022-03-02T15:19:20Z) - An Experimental Design Perspective on Model-Based Reinforcement Learning [73.37942845983417]
In practical applications of RL, it is expensive to observe state transitions from the environment.
We propose an acquisition function that quantifies how much information a state-action pair would provide about the optimal solution to a Markov decision process.
arXiv Detail & Related papers (2021-12-09T23:13:57Z) - Utilizing Redundancy in Cost Functions for Resilience in Distributed
Optimization and Learning [1.8414221462731502]
This paper considers the problem of resilient distributed optimization and machine learning in a server-based architecture.
The system comprises a server and multiple agents, where each agent has a local cost function.
We consider the case when some of the agents may be asynchronous and/or Byzantine faulty.
arXiv Detail & Related papers (2021-10-21T02:41:19Z) - Instance-optimality in optimal value estimation: Adaptivity via
variance-reduced Q-learning [99.34907092347733]
We analyze the problem of estimating optimal $Q$-value functions for a discounted Markov decision process with discrete states and actions.
Using a local minimax framework, we show that this functional arises in lower bounds on the accuracy on any estimation procedure.
In the other direction, we establish the sharpness of our lower bounds, up to factors logarithmic in the state and action spaces, by analyzing a variance-reduced version of $Q$-learning.
arXiv Detail & Related papers (2021-06-28T00:38:54Z) - Contingency-Aware Influence Maximization: A Reinforcement Learning
Approach [52.109536198330126]
influence (IM) problem aims at finding a subset of seed nodes in a social network that maximize the spread of influence.
In this study, we focus on a sub-class of IM problems, where whether the nodes are willing to be the seeds when being invited is uncertain, called contingency-aware IM.
Despite the initial success, a major practical obstacle in promoting the solutions to more communities is the tremendous runtime of the greedy algorithms.
arXiv Detail & Related papers (2021-06-13T16:42:22Z) - Resource Allocation via Model-Free Deep Learning in Free Space Optical
Communications [119.81868223344173]
The paper investigates the general problem of resource allocation for mitigating channel fading effects in Free Space Optical (FSO) communications.
Under this framework, we propose two algorithms that solve FSO resource allocation problems.
arXiv Detail & Related papers (2020-07-27T17:38:51Z) - Efficient Empowerment Estimation for Unsupervised Stabilization [75.32013242448151]
empowerment principle enables unsupervised stabilization of dynamical systems at upright positions.
We propose an alternative solution based on a trainable representation of a dynamical system as a Gaussian channel.
We show that our method has a lower sample complexity, is more stable in training, possesses the essential properties of the empowerment function, and allows estimation of empowerment from images.
arXiv Detail & Related papers (2020-07-14T21:10:16Z) - Decentralized Reinforcement Learning: Global Decision-Making via Local
Economic Transactions [80.49176924360499]
We establish a framework for directing a society of simple, specialized, self-interested agents to solve sequential decision problems.
We derive a class of decentralized reinforcement learning algorithms.
We demonstrate the potential advantages of a society's inherent modular structure for more efficient transfer learning.
arXiv Detail & Related papers (2020-07-05T16:41:09Z) - An Information Bottleneck Approach for Controlling Conciseness in
Rationale Extraction [84.49035467829819]
We show that it is possible to better manage this trade-off by optimizing a bound on the Information Bottleneck (IB) objective.
Our fully unsupervised approach jointly learns an explainer that predicts sparse binary masks over sentences, and an end-task predictor that considers only the extracted rationale.
arXiv Detail & Related papers (2020-05-01T23:26:41Z) - Controlling Computation versus Quality for Neural Sequence Models [42.525463454120256]
Conditional computation makes neural sequence models (Transformers) more efficient and computation-aware during inference.
We evaluate our approach on two tasks: (i) WMT English-French Translation and (ii) Unsupervised representation learning (BERT)
arXiv Detail & Related papers (2020-02-17T17:54:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.