Improving Systematic Generalization Through Modularity and Augmentation
- URL: http://arxiv.org/abs/2202.10745v1
- Date: Tue, 22 Feb 2022 09:04:35 GMT
- Title: Improving Systematic Generalization Through Modularity and Augmentation
- Authors: Laura Ruis and Brenden Lake
- Abstract summary: We investigate how two well-known modeling principles -- modularity and data augmentation -- affect systematic generalization of neural networks.
We show that even in the controlled setting of a synthetic benchmark, achieving systematic generalization remains very difficult.
- Score: 1.2183405753834562
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Systematic generalization is the ability to combine known parts into novel
meaning; an important aspect of efficient human learning, but a weakness of
neural network learning. In this work, we investigate how two well-known
modeling principles -- modularity and data augmentation -- affect systematic
generalization of neural networks in grounded language learning. We analyze how
large the vocabulary needs to be to achieve systematic generalization and how
similar the augmented data needs to be to the problem at hand. Our findings
show that even in the controlled setting of a synthetic benchmark, achieving
systematic generalization remains very difficult. After training on an
augmented dataset with almost forty times more adverbs than the original
problem, a non-modular baseline is not able to systematically generalize to a
novel combination of a known verb and adverb. When separating the task into
cognitive processes like perception and navigation, a modular neural network is
able to utilize the augmented data and generalize more systematically,
achieving 70% and 40% exact match increase over state-of-the-art on two gSCAN
tests that have not previously been improved. We hope that this work gives
insight into the drivers of systematic generalization, and what we still need
to improve for neural networks to learn more like humans do.
Related papers
- Early learning of the optimal constant solution in neural networks and humans [4.016584525313835]
We show that learning of a target function is preceded by an early phase in which networks learn the optimal constant solution (OCS)
We show that learning of the OCS can emerge even in the absence of bias terms and is equivalently driven by generic correlations in the input data.
Our work suggests the OCS as a universal learning principle in supervised, error-corrective learning.
arXiv Detail & Related papers (2024-06-25T11:12:52Z) - Modular Neural Network Approaches for Surgical Image Recognition [0.0]
We introduce and evaluate different architectures of modular learning for Dorsal Capsulo-Scapholunate Septum (DCSS) instability classification.
Our experiments have shown that modular learning improves performances compared to non-modular systems.
In the second part, we present our approach for data labeling and segmentation with self-training applied on shoulder arthroscopy images.
arXiv Detail & Related papers (2023-07-17T22:28:16Z) - Synergistic information supports modality integration and flexible
learning in neural networks solving multiple tasks [107.8565143456161]
We investigate the information processing strategies adopted by simple artificial neural networks performing a variety of cognitive tasks.
Results show that synergy increases as neural networks learn multiple diverse tasks.
randomly turning off neurons during training through dropout increases network redundancy, corresponding to an increase in robustness.
arXiv Detail & Related papers (2022-10-06T15:36:27Z) - Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs.
By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z) - Being Friends Instead of Adversaries: Deep Networks Learn from Data
Simplified by Other Networks [23.886422706697882]
A different idea has been recently proposed, named Friendly Training, which consists in altering the input data by adding an automatically estimated perturbation.
We revisit and extend this idea inspired by the effectiveness of neural generators in the context of Adversarial Machine Learning.
We propose an auxiliary multi-layer network that is responsible of altering the input data to make them easier to be handled by the classifier.
arXiv Detail & Related papers (2021-12-18T16:59:35Z) - Dynamic Inference with Neural Interpreters [72.90231306252007]
We present Neural Interpreters, an architecture that factorizes inference in a self-attention network as a system of modules.
inputs to the model are routed through a sequence of functions in a way that is end-to-end learned.
We show that Neural Interpreters perform on par with the vision transformer using fewer parameters, while being transferrable to a new task in a sample efficient manner.
arXiv Detail & Related papers (2021-10-12T23:22:45Z) - Embracing the Dark Knowledge: Domain Generalization Using Regularized
Knowledge Distillation [65.79387438988554]
Lack of generalization capability in the absence of sufficient and representative data is one of the challenges that hinder their practical application.
We propose a simple, effective, and plug-and-play training strategy named Knowledge Distillation for Domain Generalization (KDDG)
We find that both the richer dark knowledge" from the teacher network, as well as the gradient filter we proposed, can reduce the difficulty of learning the mapping.
arXiv Detail & Related papers (2021-07-06T14:08:54Z) - A neural anisotropic view of underspecification in deep learning [60.119023683371736]
We show that the way neural networks handle the underspecification of problems is highly dependent on the data representation.
Our results highlight that understanding the architectural inductive bias in deep learning is fundamental to address the fairness, robustness, and generalization of these systems.
arXiv Detail & Related papers (2021-04-29T14:31:09Z) - Systematic Generalization on gSCAN with Language Conditioned Embedding [19.39687991647301]
Systematic Generalization refers to a learning algorithm's ability to extrapolate learned behavior to unseen situations.
We propose a novel method that learns objects' contextualized embeddings with dynamic message passing conditioned on the input natural language.
arXiv Detail & Related papers (2020-09-11T17:35:05Z) - Incremental Training of a Recurrent Neural Network Exploiting a
Multi-Scale Dynamic Memory [79.42778415729475]
We propose a novel incrementally trained recurrent architecture targeting explicitly multi-scale learning.
We show how to extend the architecture of a simple RNN by separating its hidden state into different modules.
We discuss a training algorithm where new modules are iteratively added to the model to learn progressively longer dependencies.
arXiv Detail & Related papers (2020-06-29T08:35:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.