Learning by Competition of Self-Interested Reinforcement Learning Agents
- URL: http://arxiv.org/abs/2010.09770v3
- Date: Wed, 22 Dec 2021 16:36:35 GMT
- Title: Learning by Competition of Self-Interested Reinforcement Learning Agents
- Authors: Stephen Chung
- Abstract summary: An artificial neural network can be trained by uniformly broadcasting a reward signal to units that implement a REINFORCE learning rule.
We propose replacing the reward signal to hidden units with the change in the $L2$ norm of the unit's outgoing weight.
Our experiments show that a network trained with Weight Maximization can learn significantly faster than REINFORCE and slightly slower than backpropagation.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: An artificial neural network can be trained by uniformly broadcasting a
reward signal to units that implement a REINFORCE learning rule. Though this
presents a biologically plausible alternative to backpropagation in training a
network, the high variance associated with it renders it impractical to train
deep networks. The high variance arises from the inefficient structural credit
assignment since a single reward signal is used to evaluate the collective
action of all units. To facilitate structural credit assignment, we propose
replacing the reward signal to hidden units with the change in the $L^2$ norm
of the unit's outgoing weight. As such, each hidden unit in the network is
trying to maximize the norm of its outgoing weight instead of the global
reward, and thus we call this learning method Weight Maximization. We prove
that Weight Maximization is approximately following the gradient of rewards in
expectation. In contrast to backpropagation, Weight Maximization can be used to
train both continuous-valued and discrete-valued units. Moreover, Weight
Maximization solves several major issues of backpropagation relating to
biological plausibility. Our experiments show that a network trained with
Weight Maximization can learn significantly faster than REINFORCE and slightly
slower than backpropagation. Weight Maximization illustrates an example of
cooperative behavior automatically arising from a population of self-interested
agents in a competitive game without any central coordination.
Related papers
- Dense Reward for Free in Reinforcement Learning from Human Feedback [64.92448888346125]
We leverage the fact that the reward model contains more information than just its scalar output.
We use these attention weights to redistribute the reward along the whole completion.
Empirically, we show that it stabilises training, accelerates the rate of learning, and, in practical cases, may lead to better local optima.
arXiv Detail & Related papers (2024-02-01T17:10:35Z) - Unbiased Weight Maximization [0.0]
We propose a new learning rule for a network of Bernoulli-logistic units that is unbiased and scales well with the number of network's units in terms of learning speed.
Notably, to our knowledge, this is the first learning rule for a network of Bernoulli-logistic units that is unbiased and scales well with the number of network's units in terms of learning speed.
arXiv Detail & Related papers (2023-07-25T05:45:52Z) - Structural Credit Assignment with Coordinated Exploration [0.0]
Methods aimed at improving structural credit assignment can generally be classified into two categories.
We propose the use of Boltzmann machines or a recurrent network for coordinated exploration.
Experimental results demonstrate that coordinated exploration significantly exceeds independent exploration in training speed.
arXiv Detail & Related papers (2023-07-25T04:55:45Z) - IF2Net: Innately Forgetting-Free Networks for Continual Learning [49.57495829364827]
Continual learning can incrementally absorb new concepts without interfering with previously learned knowledge.
Motivated by the characteristics of neural networks, we investigated how to design an Innately Forgetting-Free Network (IF2Net)
IF2Net allows a single network to inherently learn unlimited mapping rules without telling task identities at test time.
arXiv Detail & Related papers (2023-06-18T05:26:49Z) - Rewarded soups: towards Pareto-optimal alignment by interpolating
weights fine-tuned on diverse rewards [101.7246658985579]
Foundation models are first pre-trained on vast unsupervised datasets and then fine-tuned on labeled data.
We propose embracing the heterogeneity of diverse rewards by following a multi-policy strategy.
We demonstrate the effectiveness of our approach for text-to-text (summarization, Q&A, helpful assistant, review), text-image (image captioning, text-to-image generation, visual grounding, VQA), and control (locomotion) tasks.
arXiv Detail & Related papers (2023-06-07T14:58:15Z) - Sparsity Winning Twice: Better Robust Generalization from More Efficient
Training [94.92954973680914]
We introduce two alternatives for sparse adversarial training: (i) static sparsity and (ii) dynamic sparsity.
We find both methods to yield win-win: substantially shrinking the robust generalization gap and alleviating the robust overfitting.
Our approaches can be combined with existing regularizers, establishing new state-of-the-art results in adversarial training.
arXiv Detail & Related papers (2022-02-20T15:52:08Z) - Hindsight Network Credit Assignment: Efficient Credit Assignment in
Networks of Discrete Stochastic Units [2.28438857884398]
We present Hindsight Network Credit Assignment (HNCA), a novel learning algorithm for networks of discrete units.
HNCA works by assigning credit to each unit based on the degree to which its output influences its immediate children in the network.
We show how HNCA can be extended to optimize a more general function of the outputs of a network of units, where the function is known to the agent.
arXiv Detail & Related papers (2021-10-14T20:18:38Z) - MAP Propagation Algorithm: Faster Learning with a Team of Reinforcement
Learning Agents [0.0]
An alternative way of training an artificial neural network is through treating each unit in the network as a reinforcement learning agent.
We propose a novel algorithm called MAP propagation to reduce this variance significantly.
Our work thus allows for the broader application of the teams of agents in deep reinforcement learning.
arXiv Detail & Related papers (2020-10-15T17:17:39Z) - Understanding Self-supervised Learning with Dual Deep Networks [74.92916579635336]
We propose a novel framework to understand contrastive self-supervised learning (SSL) methods that employ dual pairs of deep ReLU networks.
We prove that in each SGD update of SimCLR with various loss functions, the weights at each layer are updated by a emphcovariance operator.
To further study what role the covariance operator plays and which features are learned in such a process, we model data generation and augmentation processes through a emphhierarchical latent tree model (HLTM)
arXiv Detail & Related papers (2020-10-01T17:51:49Z) - Adversarial Training with Stochastic Weight Average [4.633908654744751]
Adrial training deep neural networks often experience serious overfitting problem.
In traditional machine learning, one way to relieve overfitting from the lack of data is to use ensemble methods.
In this paper, we propose adversarial training with weight average (SWA)
While performing adversarial training, we aggregate the temporal weight states in the trajectory of training.
arXiv Detail & Related papers (2020-09-21T04:47:20Z) - Distance-Based Regularisation of Deep Networks for Fine-Tuning [116.71288796019809]
We develop an algorithm that constrains a hypothesis class to a small sphere centred on the initial pre-trained weights.
Empirical evaluation shows that our algorithm works well, corroborating our theoretical results.
arXiv Detail & Related papers (2020-02-19T16:00:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.