Bilinear value networks
- URL: http://arxiv.org/abs/2204.13695v3
- Date: Mon, 26 Jun 2023 21:15:39 GMT
- Title: Bilinear value networks
- Authors: Zhang-Wei Hong, Ge Yang, Pulkit Agrawal
- Abstract summary: We show that our bilinear decomposition scheme substantially improves data efficiency and has superior transfer to out-of-distribution goals.
Empirical evidence is provided on the simulated Fetch robot task-suite and dexterous manipulation with a Shadow hand.
- Score: 16.479582509493756
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The dominant framework for off-policy multi-goal reinforcement learning
involves estimating goal conditioned Q-value function. When learning to achieve
multiple goals, data efficiency is intimately connected with the generalization
of the Q-function to new goals. The de-facto paradigm is to approximate Q(s, a,
g) using monolithic neural networks. To improve the generalization of the
Q-function, we propose a bilinear decomposition that represents the Q-value via
a low-rank approximation in the form of a dot product between two vector
fields. The first vector field, f(s, a), captures the environment's local
dynamics at the state s; whereas the second component, {\phi}(s, g), captures
the global relationship between the current state and the goal. We show that
our bilinear decomposition scheme substantially improves data efficiency, and
has superior transfer to out-of-distribution goals compared to prior methods.
Empirical evidence is provided on the simulated Fetch robot task-suite and
dexterous manipulation with a Shadow hand.
Related papers
- Interpetable Target-Feature Aggregation for Multi-Task Learning based on Bias-Variance Analysis [53.38518232934096]
Multi-task learning (MTL) is a powerful machine learning paradigm designed to leverage shared knowledge across tasks to improve generalization and performance.
We propose an MTL approach at the intersection between task clustering and feature transformation based on a two-phase iterative aggregation of targets and features.
In both phases, a key aspect is to preserve the interpretability of the reduced targets and features through the aggregation with the mean, which is motivated by applications to Earth science.
arXiv Detail & Related papers (2024-06-12T08:30:16Z) - Scalable Property Valuation Models via Graph-based Deep Learning [5.172964916120902]
We develop two novel graph neural network models that effectively identify sequences of neighboring houses with similar features.
We show that employing tailored graph neural networks significantly improves the accuracy of house price prediction.
arXiv Detail & Related papers (2024-05-10T15:54:55Z) - The limitation of neural nets for approximation and optimization [0.0]
We are interested in assessing the use of neural networks as surrogate models to approximate and minimize objective functions in optimization problems.
Our study begins by determining the best activation function for approximating the objective functions of popular nonlinear optimization test problems.
arXiv Detail & Related papers (2023-11-21T00:21:15Z) - Pointer Networks with Q-Learning for Combinatorial Optimization [55.2480439325792]
We introduce the Pointer Q-Network (PQN), a hybrid neural architecture that integrates model-free Q-value policy approximation with Pointer Networks (Ptr-Nets)
Our empirical results demonstrate the efficacy of this approach, also testing the model in unstable environments.
arXiv Detail & Related papers (2023-11-05T12:03:58Z) - Functional Indirection Neural Estimator for Better Out-of-distribution
Generalization [27.291114360472243]
FINE (Functional Indirection Neural Estorimator) learns to compose functions that map data input to output on-the-fly.
We train FINE and competing models on IQ tasks using images from the MNIST, Omniglot and CIFAR100 datasets.
FINE not only achieves the best performance on all tasks but also is able to adapt to small-scale data scenarios.
arXiv Detail & Related papers (2022-10-23T14:43:02Z) - Goal-Conditioned Q-Learning as Knowledge Distillation [136.79415677706612]
We explore a connection between off-policy reinforcement learning in goal-conditioned settings and knowledge distillation.
We empirically show that this can improve the performance of goal-conditioned off-policy reinforcement learning when the space of goals is high-dimensional.
We also show that this technique can be adapted to allow for efficient learning in the case of multiple simultaneous sparse goals.
arXiv Detail & Related papers (2022-08-28T22:01:10Z) - Graph-based Algorithm Unfolding for Energy-aware Power Allocation in
Wireless Networks [27.600081147252155]
We develop a novel graph sumable framework to maximize energy efficiency in wireless communication networks.
We show the permutation training which is a desirable property for models of wireless network data.
Results demonstrate its generalizability across different network topologies.
arXiv Detail & Related papers (2022-01-27T20:23:24Z) - Online Target Q-learning with Reverse Experience Replay: Efficiently
finding the Optimal Policy for Linear MDPs [50.75812033462294]
We bridge the gap between practical success of Q-learning and pessimistic theoretical results.
We present novel methods Q-Rex and Q-RexDaRe.
We show that Q-Rex efficiently finds the optimal policy for linear MDPs.
arXiv Detail & Related papers (2021-10-16T01:47:41Z) - Communication-Efficient Distributed Stochastic AUC Maximization with
Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network.
Our model requires a much less number of communication rounds and still a number of communication rounds in theory.
Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z) - Optimal Gradient Quantization Condition for Communication-Efficient
Distributed Training [99.42912552638168]
Communication of gradients is costly for training deep neural networks with multiple devices in computer vision applications.
In this work, we deduce the optimal condition of both the binary and multi-level gradient quantization for textbfANY gradient distribution.
Based on the optimal condition, we develop two novel quantization schemes: biased BinGrad and unbiased ORQ for binary and multi-level gradient quantization respectively.
arXiv Detail & Related papers (2020-02-25T18:28:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.