Neural Reward Machines
- URL: http://arxiv.org/abs/2408.08677v1
- Date: Fri, 16 Aug 2024 11:44:27 GMT
- Title: Neural Reward Machines
- Authors: Elena Umili, Francesco Argenziano, Roberto Capobianco,
- Abstract summary: Non-markovian Reinforcement Learning (RL) tasks are very hard to solve, because agents must consider the entire history of state-action pairs to act rationally in the environment.
We define Neural Reward Machines (NRM), an automata-based neurosymbolic framework that can be used for both reasoning and learning in non-symbolic RL domains.
We show that NRMs can exploit high-level symbolic knowledge in non-symbolic environments without any knowledge of the SG function, outperforming Deep RL methods which cannot incorporate prior knowledge.
- Score: 2.0755366440393743
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Non-markovian Reinforcement Learning (RL) tasks are very hard to solve, because agents must consider the entire history of state-action pairs to act rationally in the environment. Most works use symbolic formalisms (as Linear Temporal Logic or automata) to specify the temporally-extended task. These approaches only work in finite and discrete state environments or continuous problems for which a mapping between the raw state and a symbolic interpretation is known as a symbol grounding (SG) function. Here, we define Neural Reward Machines (NRM), an automata-based neurosymbolic framework that can be used for both reasoning and learning in non-symbolic non-markovian RL domains, which is based on the probabilistic relaxation of Moore Machines. We combine RL with semisupervised symbol grounding (SSSG) and we show that NRMs can exploit high-level symbolic knowledge in non-symbolic environments without any knowledge of the SG function, outperforming Deep RL methods which cannot incorporate prior knowledge. Moreover, we advance the research in SSSG, proposing an algorithm for analysing the groundability of temporal specifications, which is more efficient than baseline techniques of a factor $10^3$.
Related papers
- BlendRL: A Framework for Merging Symbolic and Neural Policy Learning [23.854830898003726]
BlendRL is a neuro-symbolic RL framework that integrates both paradigms within RL agents that use mixtures of both logic and neural policies.
We empirically demonstrate that BlendRL agents outperform both neural and symbolic baselines in standard Atari environments.
We analyze the interaction between neural and symbolic policies, illustrating how their hybrid use helps agents overcome each other's limitations.
arXiv Detail & Related papers (2024-10-15T15:24:20Z) - Converging Paradigms: The Synergy of Symbolic and Connectionist AI in LLM-Empowered Autonomous Agents [55.63497537202751]
Article explores the convergence of connectionist and symbolic artificial intelligence (AI)
Traditionally, connectionist AI focuses on neural networks, while symbolic AI emphasizes symbolic representation and logic.
Recent advancements in large language models (LLMs) highlight the potential of connectionist architectures in handling human language as a form of symbols.
arXiv Detail & Related papers (2024-07-11T14:00:53Z) - IID Relaxation by Logical Expressivity: A Research Agenda for Fitting Logics to Neurosymbolic Requirements [50.57072342894621]
We discuss the benefits of exploiting known data dependencies and distribution constraints for Neurosymbolic use cases.
This opens a new research agenda with general questions about Neurosymbolic background knowledge and the expressivity required of its logic.
arXiv Detail & Related papers (2024-04-30T12:09:53Z) - The Role of Foundation Models in Neuro-Symbolic Learning and Reasoning [54.56905063752427]
Neuro-Symbolic AI (NeSy) holds promise to ensure the safe deployment of AI systems.
Existing pipelines that train the neural and symbolic components sequentially require extensive labelling.
New architecture, NeSyGPT, fine-tunes a vision-language foundation model to extract symbolic features from raw data.
arXiv Detail & Related papers (2024-02-02T20:33:14Z) - Reinforcement Learning with Temporal-Logic-Based Causal Diagrams [25.538860320318943]
We study a class of reinforcement learning (RL) tasks where the objective of the agent is to accomplish temporally extended goals.
While these machines model the reward function, they often overlook the causal knowledge about the environment.
We propose the Temporal-Logic-based Causal Diagram (TL-CD) in RL, which captures the temporal causal relationships between different properties of the environment.
arXiv Detail & Related papers (2023-06-23T18:42:27Z) - Noisy Symbolic Abstractions for Deep RL: A case study with Reward
Machines [23.15484341058261]
We investigate how to generate policies via RL when reward functions are specified in a symbolic language captured by Reward Machines.
We formulate the problem of policy learning in Reward Machines with noisy symbolic abstractions.
arXiv Detail & Related papers (2022-11-20T08:13:48Z) - Exploration Policies for On-the-Fly Controller Synthesis: A
Reinforcement Learning Approach [0.0]
We propose a new method for obtaining unboundeds based on Reinforcement Learning (RL)
Our agents learn from scratch in a highly observable partially RL task and outperform existing overall, in instances unseen during training.
arXiv Detail & Related papers (2022-10-07T20:28:25Z) - Automated Machine Learning, Bounded Rationality, and Rational
Metareasoning [62.997667081978825]
We will look at automated machine learning (AutoML) and related problems from the perspective of bounded rationality.
Taking actions under bounded resources requires an agent to reflect on how to use these resources in an optimal way.
arXiv Detail & Related papers (2021-09-10T09:10:20Z) - Multi-Agent Reinforcement Learning with Temporal Logic Specifications [65.79056365594654]
We study the problem of learning to satisfy temporal logic specifications with a group of agents in an unknown environment.
We develop the first multi-agent reinforcement learning technique for temporal logic specifications.
We provide correctness and convergence guarantees for our main algorithm.
arXiv Detail & Related papers (2021-02-01T01:13:03Z) - Certified Reinforcement Learning with Logic Guidance [78.2286146954051]
We propose a model-free RL algorithm that enables the use of Linear Temporal Logic (LTL) to formulate a goal for unknown continuous-state/action Markov Decision Processes (MDPs)
The algorithm is guaranteed to synthesise a control policy whose traces satisfy the specification with maximal probability.
arXiv Detail & Related papers (2019-02-02T20:09:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.