Related papers: Unveiling Options with Neural Decomposition

Unveiling Options with Neural Decomposition

URL: http://arxiv.org/abs/2410.11262v1
Date: Tue, 15 Oct 2024 04:36:44 GMT
Title: Unveiling Options with Neural Decomposition
Authors: Mahdi Alikhasi, Levi H. S. Lelis,
Abstract summary: In reinforcement learning, agents often learn policies for specific tasks without the ability to generalize this knowledge to related tasks. This paper introduces an algorithm that attempts to address this limitation by decomposing neural networks encoding policies for Markov Decision Processes into reusable sub-policies. We turn each of these sub-policies into options by wrapping them with while-loops of varied number of iterations.
Score: 11.975013522386538
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In reinforcement learning, agents often learn policies for specific tasks without the ability to generalize this knowledge to related tasks. This paper introduces an algorithm that attempts to address this limitation by decomposing neural networks encoding policies for Markov Decision Processes into reusable sub-policies, which are used to synthesize temporally extended actions, or options. We consider neural networks with piecewise linear activation functions, so that they can be mapped to an equivalent tree that is similar to oblique decision trees. Since each node in such a tree serves as a function of the input of the tree, each sub-tree is a sub-policy of the main policy. We turn each of these sub-policies into options by wrapping it with while-loops of varied number of iterations. Given the large number of options, we propose a selection mechanism based on minimizing the Levin loss for a uniform policy on these options. Empirical results in two grid-world domains where exploration can be difficult confirm that our method can identify useful options, thereby accelerating the learning process on similar but different tasks.

Related papers

Reinforcement Learning for Node Selection in Branch-and-Bound [52.2648997215667]
Current state-of-the-art selectors utilize either hand-crafted ensembles that automatically switch between naive sub-node selectors, or learned node selectors that rely on individual node data. We propose a novel simulation technique that uses reinforcement learning (RL) while considering the entire tree state, rather than just isolated nodes.
arXiv Detail & Related papers (2023-09-29T19:55:56Z)
TreeDQN: Learning to minimize Branch-and-Bound tree [78.52895577861327]
Branch-and-Bound is a convenient approach to solving optimization tasks in the form of Mixed Linear Programs. The efficiency of the solver depends on the branchning used to select a variable for splitting. We propose a reinforcement learning method that can efficiently learn the branching.
arXiv Detail & Related papers (2023-06-09T14:01:26Z)
LEURN: Learning Explainable Univariate Rules with Neural Networks [0.0]
LEURN is a neural network architecture that learns univariate decision rules. LEURN achieves comparable performance to state-of-the-art methods across 30 datasets for classification and regression problems.
arXiv Detail & Related papers (2023-03-27T06:34:42Z)
Multi-Task Off-Policy Learning from Bandit Feedback [54.96011624223482]
We propose a hierarchical off-policy optimization algorithm (HierOPO), which estimates the parameters of the hierarchical model and then acts pessimistically with respect to them. We prove per-task bounds on the suboptimality of the learned policies, which show a clear improvement over not using the hierarchical model. Our theoretical and empirical results show a clear advantage of using the hierarchy over solving each task independently.
arXiv Detail & Related papers (2022-12-09T08:26:27Z)
Equivariance with Learned Canonicalization Functions [77.32483958400282]
We show that learning a small neural network to perform canonicalization is better than using predefineds. Our experiments show that learning the canonicalization function is competitive with existing techniques for learning equivariant functions across many tasks.
arXiv Detail & Related papers (2022-11-11T21:58:15Z)
Growing Deep Forests Efficiently with Soft Routing and Learned Connectivity [79.83903179393164]
This paper further extends the deep forest idea in several important aspects. We employ a probabilistic tree whose nodes make probabilistic routing decisions, a.k.a., soft routing, rather than hard binary decisions. Experiments on the MNIST dataset demonstrate that our empowered deep forests can achieve better or comparable performance than [1],[3].
arXiv Detail & Related papers (2020-12-29T18:05:05Z)
E2E-FS: An End-to-End Feature Selection Method for Neural Networks [0.3222802562733786]
We present a novel selection algorithm, called EndtoEnd Feature Selection (E2FS) Our algorithm, similar to the lasso approach, is solved with gradient descent techniques. Although hard restrictions, experimental results show that this algorithm can be used with any learning model.
arXiv Detail & Related papers (2020-12-14T16:19:25Z)
Learning Binary Decision Trees by Argmin Differentiation [34.9154848754842]
We learn binary decision trees that partition data for some downstream task. We do so by relaxing a mixed-integer program for the discrete parameters. We derive customized algorithms to efficiently compute the forward and backward passes.
arXiv Detail & Related papers (2020-10-09T15:11:28Z)
Parameterizing Branch-and-Bound Search Trees to Learn Branching Policies [76.83991682238666]
Branch and Bound (B&B) is the exact tree search method typically used to solve Mixed-Integer Linear Programming problems (MILPs) We propose a novel imitation learning framework, and introduce new input features and architectures to represent branching.
arXiv Detail & Related papers (2020-02-12T17:43:23Z)
Reinforcement Learning for POMDP: Partitioned Rollout and Policy Iteration with Application to Autonomous Sequential Repair Problems [2.6389022766562236]
We consider infinite horizon discounted dynamic programming problems with finite state and control spaces, and partial state observations. We discuss an algorithm that uses multistep lookahead, truncated rollout with a known base policy, and a terminal cost function approximation.
arXiv Detail & Related papers (2020-02-11T02:38:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.