Online Decision MetaMorphFormer: A Casual Transformer-Based Reinforcement Learning Framework of Universal Embodied Intelligence
- URL: http://arxiv.org/abs/2409.07341v1
- Date: Wed, 11 Sep 2024 15:22:43 GMT
- Title: Online Decision MetaMorphFormer: A Casual Transformer-Based Reinforcement Learning Framework of Universal Embodied Intelligence
- Authors: Luo Ji, Runji Lin,
- Abstract summary: Online Decision MetaMorphFormer (ODM) aims to achieve self-awareness, environment recognition, and action planning.
ODM can be applied to any arbitrary agent with a multi-joint body, located in different environments, and trained with different types of tasks using large-scale pre-trained datasets.
- Score: 2.890656584329591
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Interactive artificial intelligence in the motion control field is an interesting topic, especially when universal knowledge is adaptive to multiple tasks and universal environments. Despite there being increasing efforts in the field of Reinforcement Learning (RL) with the aid of transformers, most of them might be limited by the offline training pipeline, which prohibits exploration and generalization abilities. To address this limitation, we propose the framework of Online Decision MetaMorphFormer (ODM) which aims to achieve self-awareness, environment recognition, and action planning through a unified model architecture. Motivated by cognitive and behavioral psychology, an ODM agent is able to learn from others, recognize the world, and practice itself based on its own experience. ODM can also be applied to any arbitrary agent with a multi-joint body, located in different environments, and trained with different types of tasks using large-scale pre-trained datasets. Through the use of pre-trained datasets, ODM can quickly warm up and learn the necessary knowledge to perform the desired task, while the target environment continues to reinforce the universal policy. Extensive online experiments as well as few-shot and zero-shot environmental tests are used to verify ODM's performance and generalization ability. The results of our study contribute to the study of general artificial intelligence in embodied and cognitive fields. Code, results, and video examples can be found on the website \url{https://rlodm.github.io/odm/}.
Related papers
- Ontology-Enhanced Decision-Making for Autonomous Agents in Dynamic and Partially Observable Environments [0.0]
This thesis introduces an ontology-enhanced decision-making model (OntoDeM) for autonomous agents.
OntoDeM enriches agents' domain knowledge, allowing them to interpret unforeseen events, generate or adapt goals, and make better decisions.
Compared to traditional and advanced learning algorithms, OntoDeM shows superior performance in improving agents' observations and decision-making in dynamic, partially observable environments.
arXiv Detail & Related papers (2024-05-27T22:52:23Z) - Zero-shot cross-modal transfer of Reinforcement Learning policies
through a Global Workspace [48.24821328103934]
We train a 'Global Workspace' to exploit information collected about the environment via two input modalities.
In two distinct environments and tasks, our results reveal the model's ability to perform zero-shot cross-modal transfer between input modalities.
arXiv Detail & Related papers (2024-03-07T15:35:29Z) - Adaptive action supervision in reinforcement learning from real-world
multi-agent demonstrations [10.174009792409928]
We propose a method for adaptive action supervision in RL from real-world demonstrations in multi-agent scenarios.
In the experiments, using chase-and-escape and football tasks with the different dynamics between the unknown source and target environments, we show that our approach achieved a balance between the generalization and the generalization ability compared with the baselines.
arXiv Detail & Related papers (2023-05-22T13:33:37Z) - ArK: Augmented Reality with Knowledge Interactive Emergent Ability [115.72679420999535]
We develop an infinite agent that learns to transfer knowledge memory from general foundation models to novel domains.
The heart of our approach is an emerging mechanism, dubbed Augmented Reality with Knowledge Inference Interaction (ArK)
We show that our ArK approach, combined with large foundation models, significantly improves the quality of generated 2D/3D scenes.
arXiv Detail & Related papers (2023-05-01T17:57:01Z) - On Realization of Intelligent Decision-Making in the Real World: A
Foundation Decision Model Perspective [54.38373782121503]
A Foundation Decision Model (FDM) can be developed by formulating diverse decision-making tasks as sequence decoding tasks.
We present a case study demonstrating our FDM implementation, DigitalBrain (DB1) with 1.3 billion parameters, achieving human-level performance in 870 tasks.
arXiv Detail & Related papers (2022-12-24T06:16:45Z) - Denoised MDPs: Learning World Models Better Than the World Itself [94.74665254213588]
This work categorizes information out in the wild into four types based on controllability and relation with reward, and formulates useful information as that which is both controllable and reward-relevant.
Experiments on variants of DeepMind Control Suite and RoboDesk demonstrate superior performance of our denoised world model over using raw observations alone.
arXiv Detail & Related papers (2022-06-30T17:59:49Z) - Fully Online Meta-Learning Without Task Boundaries [80.09124768759564]
We study how meta-learning can be applied to tackle online problems of this nature.
We propose a Fully Online Meta-Learning (FOML) algorithm, which does not require any ground truth knowledge about the task boundaries.
Our experiments show that FOML was able to learn new tasks faster than the state-of-the-art online learning methods.
arXiv Detail & Related papers (2022-02-01T07:51:24Z) - Multitask Adaptation by Retrospective Exploration with Learned World
Models [77.34726150561087]
We propose a meta-learned addressing model called RAMa that provides training samples for the MBRL agent taken from task-agnostic storage.
The model is trained to maximize the expected agent's performance by selecting promising trajectories solving prior tasks from the storage.
arXiv Detail & Related papers (2021-10-25T20:02:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.