Related papers: Optimal control of the future via prospective learning with control

Optimal control of the future via prospective learning with control

URL: http://arxiv.org/abs/2511.08717v2
Date: Wed, 19 Nov 2025 17:25:38 GMT
Title: Optimal control of the future via prospective learning with control
Authors: Yuxin Bai, Aranyak Acharyya, Ashwin De Silva, Zeyu Shen, James Hassett, Joshua T. Vogelstein,
Abstract summary: Optimal control of the future is the next frontier for AI.<n>Current approaches to this problem are typically rooted in either reinforcement learning (RL) or supervised learning.<n>Here, we extend supervised learning to address learning to control in non-stationary, reset-free environments.
Score: 7.601191355718567
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Optimal control of the future is the next frontier for AI. Current approaches to this problem are typically rooted in either reinforcement learning (RL). While powerful, this learning framework is mathematically distinct from supervised learning, which has been the main workhorse for the recent achievements in AI. Moreover, RL typically operates in a stationary environment with episodic resets, limiting its utility to more realistic settings. Here, we extend supervised learning to address learning to control in non-stationary, reset-free environments. Using this framework, called ''Prospective Learning with Control (PL+C)'', we prove that under certain fairly general assumptions, empirical risk minimization (ERM) asymptotically achieves the Bayes optimal policy. We then consider a specific instance of prospective learning with control, foraging -- which is a canonical task for any mobile agent -- be it natural or artificial. We illustrate that modern RL algorithms fail to learn in these non-stationary reset-free environments, and even with modifications, they are orders of magnitude less efficient than our prospective foraging agents.

Related papers

Probabilistic Curriculum Learning for Goal-Based Reinforcement Learning [2.5352713493505785]
Reinforcement learning -- algorithms that teach artificial agents to interact with environments by maximising reward signals -- has achieved significant success in recent years.<n>One promising research direction involves introducing goals to allow multimodal policies, commonly through hierarchical or curriculum reinforcement learning.<n>We present a novel probabilistic curriculum learning algorithm to suggest goals for reinforcement learning agents in continuous control and navigation tasks.
arXiv Detail & Related papers (2025-04-02T08:15:16Z)
No Regrets: Investigating and Improving Regret Approximations for Curriculum Discovery [53.08822154199948]
Unsupervised Environment Design (UED) methods have gained recent attention as their adaptive curricula promise to enable agents to be robust to in- and out-of-distribution tasks. This work investigates how existing UED methods select training environments, focusing on task prioritisation metrics. We develop a method that directly trains on scenarios with high learnability.
arXiv Detail & Related papers (2024-08-27T14:31:54Z)
Active Reinforcement Learning for Robust Building Control [0.0]
Reinforcement learning (RL) is a powerful tool for optimal control that has found great success in Atari games, the game of Go, robotic control, and building optimization. Unsupervised environment design (UED) has been proposed as a solution to this problem, in which the agent trains in environments that have been specially selected to help it learn. We show that ActivePLR is able to outperform state-of-the-art UED algorithms in minimizing energy usage while maximizing occupant comfort in the setting of building control.
arXiv Detail & Related papers (2023-12-16T02:18:45Z)
A Multiplicative Value Function for Safe and Efficient Reinforcement Learning [131.96501469927733]
We propose a safe model-free RL algorithm with a novel multiplicative value function consisting of a safety critic and a reward critic. The safety critic predicts the probability of constraint violation and discounts the reward critic that only estimates constraint-free returns. We evaluate our method in four safety-focused environments, including classical RL benchmarks augmented with safety constraints and robot navigation tasks with images and raw Lidar scans as observations.
arXiv Detail & Related papers (2023-03-07T18:29:15Z)
Exploration Policies for On-the-Fly Controller Synthesis: A Reinforcement Learning Approach [0.0]
We propose a new method for obtaining unboundeds based on Reinforcement Learning (RL) Our agents learn from scratch in a highly observable partially RL task and outperform existing overall, in instances unseen during training.
arXiv Detail & Related papers (2022-10-07T20:28:25Z)
Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels [112.63440666617494]
Reinforcement learning algorithms can succeed but require large amounts of interactions between the agent and the environment. We propose a new method to solve it, using unsupervised model-based RL, for pre-training the agent. We show robust performance on the Real-Word RL benchmark, hinting at resiliency to environment perturbations during adaptation.
arXiv Detail & Related papers (2022-09-24T14:22:29Z)
A Walk in the Park: Learning to Walk in 20 Minutes With Model-Free Reinforcement Learning [86.06110576808824]
Deep reinforcement learning is a promising approach to learning policies in uncontrolled environments. Recent advancements in machine learning algorithms and libraries combined with a carefully tuned robot controller lead to learning quadruped in only 20 minutes in the real world.
arXiv Detail & Related papers (2022-08-16T17:37:36Z)
Don't Start From Scratch: Leveraging Prior Data to Automate Robotic Reinforcement Learning [70.70104870417784]
Reinforcement learning (RL) algorithms hold the promise of enabling autonomous skill acquisition for robotic systems. In practice, real-world robotic RL typically requires time consuming data collection and frequent human intervention to reset the environment. In this work, we study how these challenges can be tackled by effective utilization of diverse offline datasets collected from previously seen tasks.
arXiv Detail & Related papers (2022-07-11T08:31:22Z)
When does return-conditioned supervised learning work for offline reinforcement learning? [51.899892382786526]
We study the capabilities and limitations of return-conditioned supervised learning. We find that RCSL returns the optimal policy under a set of assumptions stronger than those needed for the more traditional dynamic programming-based algorithms.
arXiv Detail & Related papers (2022-06-02T15:05:42Z)
Text Generation with Efficient (Soft) Q-Learning [91.47743595382758]
Reinforcement learning (RL) offers a more flexible solution by allowing users to plug in arbitrary task metrics as reward. We introduce a new RL formulation for text generation from the soft Q-learning perspective. We apply the approach to a wide range of tasks, including learning from noisy/negative examples, adversarial attacks, and prompt generation.
arXiv Detail & Related papers (2021-06-14T18:48:40Z)
Neural Dynamic Policies for End-to-End Sensorimotor Learning [51.24542903398335]
The current dominant paradigm in sensorimotor control, whether imitation or reinforcement learning, is to train policies directly in raw action spaces. We propose Neural Dynamic Policies (NDPs) that make predictions in trajectory distribution space. NDPs outperform the prior state-of-the-art in terms of either efficiency or performance across several robotic control tasks.
arXiv Detail & Related papers (2020-12-04T18:59:32Z)
Deep Reinforcement Learning amidst Lifelong Non-Stationarity [67.24635298387624]
We show that an off-policy RL algorithm can reason about and tackle lifelong non-stationarity. Our method leverages latent variable models to learn a representation of the environment from current and past experiences. We also introduce several simulation environments that exhibit lifelong non-stationarity, and empirically find that our approach substantially outperforms approaches that do not reason about environment shift.
arXiv Detail & Related papers (2020-06-18T17:34:50Z)
A Survey of Reinforcement Learning Algorithms for Dynamically Varying Environments [1.713291434132985]
Reinforcement learning (RL) algorithms find applications in inventory control, recommender systems, vehicular traffic management, cloud computing and robotics. Real-world complications of many tasks arising in these domains makes them difficult to solve with the basic assumptions underlying classical RL algorithms. This paper provides a survey of RL methods developed for handling dynamically varying environment models. A representative collection of these algorithms is discussed in detail in this work along with their categorization and their relative merits and demerits.
arXiv Detail & Related papers (2020-05-19T09:42:42Z)
Never Stop Learning: The Effectiveness of Fine-Tuning in Robotic Reinforcement Learning [109.77163932886413]
We show how to adapt vision-based robotic manipulation policies to new variations by fine-tuning via off-policy reinforcement learning. This adaptation uses less than 0.2% of the data necessary to learn the task from scratch. We find that our approach of adapting pre-trained policies leads to substantial performance gains over the course of fine-tuning.
arXiv Detail & Related papers (2020-04-21T17:57:04Z)
Multiplicative Controller Fusion: Leveraging Algorithmic Priors for Sample-efficient Reinforcement Learning and Safe Sim-To-Real Transfer [18.50206483493784]
We present a novel approach to model-free reinforcement learning that can leverage existing sub-optimal solutions. During training, our gated fusion approach enables the prior to guide the initial stages of exploration. We show the efficacy of our Multiplicative Controller Fusion approach on the task of robot navigation.
arXiv Detail & Related papers (2020-03-11T05:12:26Z)
Keep Doing What Worked: Behavioral Modelling Priors for Offline Reinforcement Learning [25.099754758455415]
Off-policy reinforcement learning algorithms promise to be applicable in settings where only a fixed data-set of environment interactions is available. Standard off-policy algorithms fail in the batch setting for continuous control.
arXiv Detail & Related papers (2020-02-19T19:21:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.