Visually Grounded Continual Language Learning with Selective
Specialization
- URL: http://arxiv.org/abs/2310.15571v1
- Date: Tue, 24 Oct 2023 07:35:23 GMT
- Title: Visually Grounded Continual Language Learning with Selective
Specialization
- Authors: Kyra Ahrens, Lennart Bengtson, Jae Hee Lee, Stefan Wermter
- Abstract summary: A desirable trait of an artificial agent acting in the visual world is to continually learn a sequence of language-informed tasks.
Selective specialization, i.e., a careful selection of model components to specialize in each task, is a strategy to provide control over this trade-off.
- Score: 17.31203979844975
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A desirable trait of an artificial agent acting in the visual world is to
continually learn a sequence of language-informed tasks while striking a
balance between sufficiently specializing in each task and building a
generalized knowledge for transfer. Selective specialization, i.e., a careful
selection of model components to specialize in each task, is a strategy to
provide control over this trade-off. However, the design of selection
strategies requires insights on the role of each model component in learning
rather specialized or generalizable representations, which poses a gap in
current research. Thus, our aim with this work is to provide an extensive
analysis of selection strategies for visually grounded continual language
learning. Due to the lack of suitable benchmarks for this purpose, we introduce
two novel diagnostic datasets that provide enough control and flexibility for a
thorough model analysis. We assess various heuristics for module specialization
strategies as well as quantifiable measures for two different types of model
architectures. Finally, we design conceptually simple approaches based on our
analysis that outperform common continual learning baselines. Our results
demonstrate the need for further efforts towards better aligning continual
learning algorithms with the learning behaviors of individual model parts.
Related papers
- Towards a Unified View of Preference Learning for Large Language Models: A Survey [88.66719962576005]
Large Language Models (LLMs) exhibit remarkably powerful capabilities.
One of the crucial factors to achieve success is aligning the LLM's output with human preferences.
We decompose all the strategies in preference learning into four components: model, data, feedback, and algorithm.
arXiv Detail & Related papers (2024-09-04T15:11:55Z) - Examining Changes in Internal Representations of Continual Learning Models Through Tensor Decomposition [5.01338577379149]
Continual learning (CL) has spurred the development of several methods aimed at consolidating previous knowledge across sequential learning.
We propose a novel representation-based evaluation framework for CL models.
arXiv Detail & Related papers (2024-05-06T07:52:44Z) - Scalable Language Model with Generalized Continual Learning [58.700439919096155]
The Joint Adaptive Re-ization (JARe) is integrated with Dynamic Task-related Knowledge Retrieval (DTKR) to enable adaptive adjustment of language models based on specific downstream tasks.
Our method demonstrates state-of-the-art performance on diverse backbones and benchmarks, achieving effective continual learning in both full-set and few-shot scenarios with minimal forgetting.
arXiv Detail & Related papers (2024-04-11T04:22:15Z) - Hierarchically Structured Task-Agnostic Continual Learning [0.0]
We take a task-agnostic view of continual learning and develop a hierarchical information-theoretic optimality principle.
We propose a neural network layer, called the Mixture-of-Variational-Experts layer, that alleviates forgetting by creating a set of information processing paths.
Our approach can operate in a task-agnostic way, i.e., it does not require task-specific knowledge, as is the case with many existing continual learning algorithms.
arXiv Detail & Related papers (2022-11-14T19:53:15Z) - Forging Multiple Training Objectives for Pre-trained Language Models via
Meta-Learning [97.28779163988833]
Multiple pre-training objectives fill the vacancy of the understanding capability of single-objective language modeling.
We propose textitMOMETAS, a novel adaptive sampler based on meta-learning, which learns the latent sampling pattern on arbitrary pre-training objectives.
arXiv Detail & Related papers (2022-10-19T04:38:26Z) - Towards a General Pre-training Framework for Adaptive Learning in MOOCs [37.570119583573955]
We propose a unified framework based on data observation and learning style analysis, properly leveraging heterogeneous learning elements.
We find that course structures, text, and knowledge are helpful for modeling and inherently coherent to student non-sequential learning behaviors.
arXiv Detail & Related papers (2022-07-18T13:18:39Z) - Learning to Generalize Compositionally by Transferring Across Semantic
Parsing Tasks [37.66114618645146]
We investigate learning representations that facilitate transfer learning from one compositional task to another.
We apply this method to semantic parsing, using three very different datasets.
Our method significantly improves compositional generalization over baselines on the test set of the target task.
arXiv Detail & Related papers (2021-11-09T09:10:21Z) - Concept Learners for Few-Shot Learning [76.08585517480807]
We propose COMET, a meta-learning method that improves generalization ability by learning to learn along human-interpretable concept dimensions.
We evaluate our model on few-shot tasks from diverse domains, including fine-grained image classification, document categorization and cell type annotation.
arXiv Detail & Related papers (2020-07-14T22:04:17Z) - Goal-Aware Prediction: Learning to Model What Matters [105.43098326577434]
One of the fundamental challenges in using a learned forward dynamics model is the mismatch between the objective of the learned model and that of the downstream planner or policy.
We propose to direct prediction towards task relevant information, enabling the model to be aware of the current task and encouraging it to only model relevant quantities of the state space.
We find that our method more effectively models the relevant parts of the scene conditioned on the goal, and as a result outperforms standard task-agnostic dynamics models and model-free reinforcement learning.
arXiv Detail & Related papers (2020-07-14T16:42:59Z) - Provable Representation Learning for Imitation Learning via Bi-level
Optimization [60.059520774789654]
A common strategy in modern learning systems is to learn a representation that is useful for many tasks.
We study this strategy in the imitation learning setting for Markov decision processes (MDPs) where multiple experts' trajectories are available.
We instantiate this framework for the imitation learning settings of behavior cloning and observation-alone.
arXiv Detail & Related papers (2020-02-24T21:03:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.