Compositional Learning of Visually-Grounded Concepts Using Reinforcement
- URL: http://arxiv.org/abs/2309.04504v2
- Date: Fri, 3 May 2024 07:21:37 GMT
- Title: Compositional Learning of Visually-Grounded Concepts Using Reinforcement
- Authors: Zijun Lin, Haidi Azaman, M Ganesh Kumar, Cheston Tan,
- Abstract summary: Children can rapidly generalize compositionally-constructed rules to unseen test sets.
Deep reinforcement learning (RL) agents need to be trained over millions of episodes.
We show that when RL agents are naively trained to navigate to target color-shape combinations, they implicitly learn to decompose the combinations.
- Score: 5.9143643136818085
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Children can rapidly generalize compositionally-constructed rules to unseen test sets. On the other hand, deep reinforcement learning (RL) agents need to be trained over millions of episodes, and their ability to generalize to unseen combinations remains unclear. Hence, we investigate the compositional abilities of RL agents, using the task of navigating to specified color-shape targets in synthetic 3D environments. First, we show that when RL agents are naively trained to navigate to target color-shape combinations, they implicitly learn to decompose the combinations, allowing them to (re-)compose these and succeed at held-out test combinations ("compositional learning"). Second, when agents are pretrained to learn invariant shape and color concepts ("concept learning"), the number of episodes subsequently needed for compositional learning decreased by 20 times. Furthermore, only agents trained on both concept and compositional learning could solve a more complex, out-of-distribution environment in zero-shot fashion. Finally, we verified that only text encoders pretrained on image-text datasets (e.g. CLIP) reduced the number of training episodes needed for our agents to demonstrate compositional learning, and also generalized to 5 unseen colors in zero-shot fashion. Overall, our results are the first to demonstrate that RL agents can be trained to implicitly learn concepts and compositionality, to solve more complex environments in zero-shot fashion.
Related papers
- RAGEN: Understanding Self-Evolution in LLM Agents via Multi-Turn Reinforcement Learning [125.65034908728828]
Training large language models (LLMs) as interactive agents presents unique challenges.
While reinforcement learning has enabled progress in static tasks, multi-turn agent RL training remains underexplored.
We propose StarPO, a general framework for trajectory-level agent RL, and introduce RAGEN, a modular system for training and evaluating LLM agents.
arXiv Detail & Related papers (2025-04-24T17:57:08Z) - Human-like compositional learning of visually-grounded concepts using synthetic environments [6.461018127662044]
We investigate how humans learn to compose concept classes and ground visual cues through trial and error.
We design a 3D synthetic environment in which an agent learns, via reinforcement, to navigate to a target specified by a natural language instruction.
We show that reinforcement learning agents can ground determiner concepts to visual targets but struggle with more complex prepositional concepts.
arXiv Detail & Related papers (2025-04-09T06:33:28Z) - Are Transformers Able to Reason by Connecting Separated Knowledge in Training Data? [55.90575874130038]
Humans exhibit remarkable compositional reasoning by integrating knowledge from various sources.
We introduce a synthetic learning task to validate the potential of Transformers in replicating this skill.
We find that few-shot Chain-of-Thought prompting enables Transformers to perform compositional reasoning on FTCT.
arXiv Detail & Related papers (2025-01-27T08:34:38Z) - R3L: Relative Representations for Reinforcement Learning [17.76990521486307]
It is known that variations in input domains (e.g., different panorama colors due to seasonal changes) can disrupt agent performance.
Recent advancements in the field of representation learning have demonstrated the possibility of combining components to create new models.
We adapt this framework to the Visual Reinforcement Learning setting, allowing to combine agents components to create new agents capable of effectively handling novel visual-task pairs.
arXiv Detail & Related papers (2024-04-19T14:42:42Z) - Learning of Generalizable and Interpretable Knowledge in Grid-Based
Reinforcement Learning Environments [5.217870815854702]
We propose using program synthesis to imitate reinforcement learning policies.
We adapt the state-of-the-art program synthesis system DreamCoder for learning concepts in grid-based environments.
arXiv Detail & Related papers (2023-09-07T11:46:57Z) - Retrieval-Enhanced Contrastive Vision-Text Models [61.783728119255365]
We propose to equip vision-text models with the ability to refine their embedding with cross-modal retrieved information from a memory at inference time.
Remarkably, we show that this can be done with a light-weight, single-layer, fusion transformer on top of a frozen CLIP.
Our experiments validate that our retrieval-enhanced contrastive (RECO) training improves CLIP performance substantially on several challenging fine-grained tasks.
arXiv Detail & Related papers (2023-06-12T15:52:02Z) - Collaboration of Pre-trained Models Makes Better Few-shot Learner [49.89134194181042]
Few-shot classification requires deep neural networks to learn generalized representations only from limited training images.
Recently, CLIP-based methods have shown promising few-shot performance benefited from the contrastive language-image pre-training.
We propose CoMo, a Collaboration of pre-trained Models that incorporates diverse prior knowledge from various pre-training paradigms for better few-shot learning.
arXiv Detail & Related papers (2022-09-25T16:23:12Z) - Reference-Limited Compositional Zero-Shot Learning [19.10692212692771]
Compositional zero-shot learning (CZSL) refers to recognizing unseen compositions of known visual primitives.
We propose a novel Meta Compositional Graph Learner (MetaCGL) that can efficiently learn the compositionality from insufficient referential information.
arXiv Detail & Related papers (2022-08-22T03:58:02Z) - Modular Lifelong Reinforcement Learning via Neural Composition [31.561979764372886]
Humans commonly solve complex problems by decomposing them into easier subproblems and then combining the subproblem solutions.
This type of compositional reasoning permits reuse of the subproblem solutions when tackling future tasks that share part of the underlying compositional structure.
In a continual or lifelong reinforcement learning (RL) setting, this ability to decompose knowledge into reusable components would enable agents to quickly learn new RL tasks.
arXiv Detail & Related papers (2022-07-01T13:48:29Z) - Inducing Transformer's Compositional Generalization Ability via
Auxiliary Sequence Prediction Tasks [86.10875837475783]
Systematic compositionality is an essential mechanism in human language, allowing the recombination of known parts to create novel expressions.
Existing neural models have been shown to lack this basic ability in learning symbolic structures.
We propose two auxiliary sequence prediction tasks that track the progress of function and argument semantics.
arXiv Detail & Related papers (2021-09-30T16:41:19Z) - Meta-Learning to Compositionally Generalize [34.656819307701156]
We implement a meta-learning augmented version of supervised learning.
We construct pairs of tasks for meta-learning by sub-sampling existing training data.
Experimental results on the COGS and SCAN datasets show that our similarity-driven meta-learning can improve generalization performance.
arXiv Detail & Related papers (2021-06-08T11:21:48Z) - Ultra-Data-Efficient GAN Training: Drawing A Lottery Ticket First, Then
Training It Toughly [114.81028176850404]
Training generative adversarial networks (GANs) with limited data generally results in deteriorated performance and collapsed models.
We decompose the data-hungry GAN training into two sequential sub-problems.
Such a coordinated framework enables us to focus on lower-complexity and more data-efficient sub-problems.
arXiv Detail & Related papers (2021-02-28T05:20:29Z) - Decoupling Representation Learning from Reinforcement Learning [89.82834016009461]
We introduce an unsupervised learning task called Augmented Temporal Contrast (ATC)
ATC trains a convolutional encoder to associate pairs of observations separated by a short time difference.
In online RL experiments, we show that training the encoder exclusively using ATC matches or outperforms end-to-end RL.
arXiv Detail & Related papers (2020-09-14T19:11:13Z) - Compositional Generalization by Learning Analytical Expressions [87.15737632096378]
A memory-augmented neural model is connected with analytical expressions to achieve compositional generalization.
Experiments on the well-known benchmark SCAN demonstrate that our model seizes a great ability of compositional generalization.
arXiv Detail & Related papers (2020-06-18T15:50:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.