Maximum diffusion reinforcement learning
- URL: http://arxiv.org/abs/2309.15293v5
- Date: Fri, 24 May 2024 18:49:00 GMT
- Title: Maximum diffusion reinforcement learning
- Authors: Thomas A. Berrueta, Allison Pinosky, Todd D. Murphey,
- Abstract summary: Correlations create fundamental challenges for machine learning.
In reinforcement learning, where data are directly collected from an agent's sequential experiences, violations of this assumption are often unavoidable.
By decorrelating agent experiences, our approach provably enables single-shot learning in continuous deployments.
- Score: 7.334017970483869
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Robots and animals both experience the world through their bodies and senses. Their embodiment constrains their experiences, ensuring they unfold continuously in space and time. As a result, the experiences of embodied agents are intrinsically correlated. Correlations create fundamental challenges for machine learning, as most techniques rely on the assumption that data are independent and identically distributed. In reinforcement learning, where data are directly collected from an agent's sequential experiences, violations of this assumption are often unavoidable. Here, we derive a method that overcomes this issue by exploiting the statistical mechanics of ergodic processes, which we term maximum diffusion reinforcement learning. By decorrelating agent experiences, our approach provably enables single-shot learning in continuous deployments over the course of individual task attempts. Moreover, we prove our approach generalizes well-known maximum entropy techniques, and robustly exceeds state-of-the-art performance across popular benchmarks. Our results at the nexus of physics, learning, and control form a foundation for transparent and reliable decision-making in embodied reinforcement learning agents.
Related papers
- Self-Consolidation for Self-Evolving Agents [51.94826934403236]
Large language model (LLM) agents operate as static systems, lacking the ability to evolve through lifelong interaction.<n>We propose a novel self-evolving framework for LLM agents that introduces a complementary evolution mechanism.
arXiv Detail & Related papers (2026-02-02T11:16:07Z) - Large Language Model Agents Are Not Always Faithful Self-Evolvers [84.08646612111092]
Self-evolving large language model (LLM) agents continually improve by accumulating and reusing past experience.<n>We present the first systematic investigation of experience faithfulness, the causal dependence of an agent's decisions on the experience it is given.
arXiv Detail & Related papers (2026-01-30T01:05:15Z) - Retrieval-augmented Prompt Learning for Pre-trained Foundation Models [101.13972024610733]
We present RetroPrompt, which aims to achieve a balance between memorization and generalization.<n>Unlike traditional prompting methods, RetroPrompt incorporates a retrieval mechanism throughout the input, training, and inference stages.<n>We conduct comprehensive experiments on a variety of datasets across natural language processing and computer vision tasks to demonstrate the superior performance of our proposed approach.
arXiv Detail & Related papers (2025-12-23T08:15:34Z) - Human-Inspired Learning for Large Language Models via Obvious Record and Maximum-Entropy Method Discovery [0.11844977816228043]
This paper proposes a human-inspired learning framework that integrates two complementary mechanisms.<n>The first, Obvious Record, explicitly stores cause--result (or question--solution) relationships as symbolic memory.<n>The second, Maximum-Entropy Method Discovery, prioritizes and preserves methods with high semantic dissimilarity.
arXiv Detail & Related papers (2025-12-14T09:12:09Z) - Agent Learning via Early Experience [93.83579011718858]
A long-term goal of language agents is to learn and improve through their own experience, ultimately outperforming humans in complex, real-world tasks.<n>Most current agents rely on supervised fine-tuning on expert data, which is challenging to scale and generalizes poorly.<n>We study two strategies of using such data: (1) Implicit world modeling, which uses collected states to ground the policy in environment dynamics; and (2) Self-reflection, where the agent learns from its suboptimal actions to improve reasoning and decision-making.
arXiv Detail & Related papers (2025-10-09T17:59:17Z) - Global Convergence of Continual Learning on Non-IID Data [51.99584235667152]
We provide a general and comprehensive theoretical analysis for continual learning of regression models.
We establish the almost sure convergence results of continual learning under a general data condition for the first time.
arXiv Detail & Related papers (2025-03-24T10:06:07Z) - In-Context Linear Regression Demystified: Training Dynamics and Mechanistic Interpretability of Multi-Head Softmax Attention [52.159541540613915]
We study how multi-head softmax attention models are trained to perform in-context learning on linear data.
Our results reveal that in-context learning ability emerges from the trained transformer as an aggregated effect of its architecture and the underlying data distribution.
arXiv Detail & Related papers (2025-03-17T02:00:49Z) - "Give Me an Example Like This": Episodic Active Reinforcement Learning from Demonstrations [3.637365301757111]
Methods like Reinforcement Learning from Expert Demonstrations (RLED) introduce external expert demonstrations to facilitate agent exploration during the learning process.
How to select the best set of human demonstrations that is most beneficial for learning becomes a major concern.
This paper presents EARLY, an algorithm that enables a learning agent to generate optimized queries of expert demonstrations in a trajectory-based feature space.
arXiv Detail & Related papers (2024-06-05T08:52:21Z) - Multi-modal Causal Structure Learning and Root Cause Analysis [67.67578590390907]
We propose Mulan, a unified multi-modal causal structure learning method for root cause localization.
We leverage a log-tailored language model to facilitate log representation learning, converting log sequences into time-series data.
We also introduce a novel key performance indicator-aware attention mechanism for assessing modality reliability and co-learning a final causal graph.
arXiv Detail & Related papers (2024-02-04T05:50:38Z) - Joint Training of Deep Ensembles Fails Due to Learner Collusion [61.557412796012535]
Ensembles of machine learning models have been well established as a powerful method of improving performance over a single model.
Traditionally, ensembling algorithms train their base learners independently or sequentially with the goal of optimizing their joint performance.
We show that directly minimizing the loss of the ensemble appears to rarely be applied in practice.
arXiv Detail & Related papers (2023-01-26T18:58:07Z) - Evaluating Membership Inference Through Adversarial Robustness [6.983991370116041]
We propose an enhanced methodology for membership inference attacks based on adversarial robustness.
We evaluate our proposed method on three datasets: Fashion-MNIST, CIFAR-10, and CIFAR-100.
arXiv Detail & Related papers (2022-05-14T06:48:47Z) - On Modality Bias Recognition and Reduction [70.69194431713825]
We study the modality bias problem in the context of multi-modal classification.
We propose a plug-and-play loss function method, whereby the feature space for each label is adaptively learned.
Our method yields remarkable performance improvements compared with the baselines.
arXiv Detail & Related papers (2022-02-25T13:47:09Z) - Autonomous Reinforcement Learning: Formalism and Benchmarking [106.25788536376007]
Real-world embodied learning, such as that performed by humans and animals, is situated in a continual, non-episodic world.
Common benchmark tasks in RL are episodic, with the environment resetting between trials to provide the agent with multiple attempts.
This discrepancy presents a major challenge when attempting to take RL algorithms developed for episodic simulated environments and run them on real-world platforms.
arXiv Detail & Related papers (2021-12-17T16:28:06Z) - Lifelong Learning from Event-based Data [22.65311698505554]
We investigate methods for learning from data produced by event cameras.
We propose a model that is composed of both, feature extraction and continuous learning.
arXiv Detail & Related papers (2021-11-11T17:59:41Z) - Multi-Agent Imitation Learning with Copulas [102.27052968901894]
Multi-agent imitation learning aims to train multiple agents to perform tasks from demonstrations by learning a mapping between observations and actions.
In this paper, we propose to use copula, a powerful statistical tool for capturing dependence among random variables, to explicitly model the correlation and coordination in multi-agent systems.
Our proposed model is able to separately learn marginals that capture the local behavioral patterns of each individual agent, as well as a copula function that solely and fully captures the dependence structure among agents.
arXiv Detail & Related papers (2021-07-10T03:49:41Z) - DEALIO: Data-Efficient Adversarial Learning for Imitation from
Observation [57.358212277226315]
In imitation learning from observation IfO, a learning agent seeks to imitate a demonstrating agent using only observations of the demonstrated behavior without access to the control signals generated by the demonstrator.
Recent methods based on adversarial imitation learning have led to state-of-the-art performance on IfO problems, but they typically suffer from high sample complexity due to a reliance on data-inefficient, model-free reinforcement learning algorithms.
This issue makes them impractical to deploy in real-world settings, where gathering samples can incur high costs in terms of time, energy, and risk.
We propose a more data-efficient IfO algorithm
arXiv Detail & Related papers (2021-03-31T23:46:32Z) - A Wholistic View of Continual Learning with Deep Neural Networks:
Forgotten Lessons and the Bridge to Active and Open World Learning [8.188575923130662]
We argue that notable lessons from open set recognition, the identification of statistically deviating data outside of the observed dataset, and the adjacent field of active learning, are frequently overlooked in the deep learning era.
Our results show that this not only benefits each individual paradigm, but highlights the natural synergies in a common framework.
arXiv Detail & Related papers (2020-09-03T16:56:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.