Generalization in Multimodal Language Learning from Simulation
- URL: http://arxiv.org/abs/2108.02319v1
- Date: Tue, 3 Aug 2021 12:55:18 GMT
- Title: Generalization in Multimodal Language Learning from Simulation
- Authors: Aaron Eisermann, Jae Hee Lee, Cornelius Weber, Stefan Wermter
- Abstract summary: We investigate the influence of the underlying training data distribution on generalization in a minimal LSTM-based network trained in a supervised, time continuous setting.
We find compositional generalization to fail in simple setups while improving with the number of objects, actions, and particularly with a lot of color overlaps between objects.
- Score: 20.751952728808153
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neural networks can be powerful function approximators, which are able to
model high-dimensional feature distributions from a subset of examples drawn
from the target distribution. Naturally, they perform well at generalizing
within the limits of their target function, but they often fail to generalize
outside of the explicitly learned feature space. It is therefore an open
research topic whether and how neural network-based architectures can be
deployed for systematic reasoning. Many studies have shown evidence for poor
generalization, but they often work with abstract data or are limited to
single-channel input. Humans, however, learn and interact through a combination
of multiple sensory modalities, and rarely rely on just one. To investigate
compositional generalization in a multimodal setting, we generate an extensible
dataset with multimodal input sequences from simulation. We investigate the
influence of the underlying training data distribution on compostional
generalization in a minimal LSTM-based network trained in a supervised, time
continuous setting. We find compositional generalization to fail in simple
setups while improving with the number of objects, actions, and particularly
with a lot of color overlaps between objects. Furthermore, multimodality
strongly improves compositional generalization in settings where a pure vision
model struggles to generalize.
Related papers
- Sequential Compositional Generalization in Multimodal Models [23.52949473093583]
We conduct a comprehensive assessment of several unimodal and multimodal models.
Our findings reveal that bi-modal and tri-modal models exhibit a clear edge over their text-only counterparts.
arXiv Detail & Related papers (2024-04-18T09:04:15Z) - On the generalization capacity of neural networks during generic
multimodal reasoning [20.1430673356983]
We evaluate and compare large language models' capacity for multimodal generalization.
For multimodal distractor and systematic generalization, either cross-modal attention or models with deeper attention layers are key architectural features required to integrate multimodal inputs.
arXiv Detail & Related papers (2024-01-26T17:42:59Z) - Generalization and Estimation Error Bounds for Model-based Neural
Networks [78.88759757988761]
We show that the generalization abilities of model-based networks for sparse recovery outperform those of regular ReLU networks.
We derive practical design rules that allow to construct model-based networks with guaranteed high generalization.
arXiv Detail & Related papers (2023-04-19T16:39:44Z) - Neural Networks and the Chomsky Hierarchy [27.470857324448136]
We study whether insights from the theory of Chomsky can predict the limits of neural network generalization in practice.
We show negative results where even extensive amounts of data and training time never led to any non-trivial generalization.
Our results show that, for our subset of tasks, RNNs and Transformers fail to generalize on non-regular tasks, and only networks augmented with structured memory can successfully generalize on context-free and context-sensitive tasks.
arXiv Detail & Related papers (2022-07-05T15:06:11Z) - On Neural Architecture Inductive Biases for Relational Tasks [76.18938462270503]
We introduce a simple architecture based on similarity-distribution scores which we name Compositional Network generalization (CoRelNet)
We find that simple architectural choices can outperform existing models in out-of-distribution generalizations.
arXiv Detail & Related papers (2022-06-09T16:24:01Z) - CHALLENGER: Training with Attribution Maps [63.736435657236505]
We show that utilizing attribution maps for training neural networks can improve regularization of models and thus increase performance.
In particular, we show that our generic domain-independent approach yields state-of-the-art results in vision, natural language processing and on time series tasks.
arXiv Detail & Related papers (2022-05-30T13:34:46Z) - Learning Prototype-oriented Set Representations for Meta-Learning [85.19407183975802]
Learning from set-structured data is a fundamental problem that has recently attracted increasing attention.
This paper provides a novel optimal transport based way to improve existing summary networks.
We further instantiate it to the cases of few-shot classification and implicit meta generative modeling.
arXiv Detail & Related papers (2021-10-18T09:49:05Z) - Dynamic Inference with Neural Interpreters [72.90231306252007]
We present Neural Interpreters, an architecture that factorizes inference in a self-attention network as a system of modules.
inputs to the model are routed through a sequence of functions in a way that is end-to-end learned.
We show that Neural Interpreters perform on par with the vision transformer using fewer parameters, while being transferrable to a new task in a sample efficient manner.
arXiv Detail & Related papers (2021-10-12T23:22:45Z) - Does language help generalization in vision models? [0.0]
We show that a visual model trained on a very large supervised image dataset (ImageNet-21k) can be as efficient for generalization as its multimodal counterpart (CLIP)
When compared to other standard visual or language models, the latent representations of BiT-M were found to be just as "linguistic" as those of CLIP.
arXiv Detail & Related papers (2021-04-16T18:54:14Z) - Neural Complexity Measures [96.06344259626127]
We propose Neural Complexity (NC), a meta-learning framework for predicting generalization.
Our model learns a scalar complexity measure through interactions with many heterogeneous tasks in a data-driven way.
arXiv Detail & Related papers (2020-08-07T02:12:10Z) - Identifying Critical Neurons in ANN Architectures using Mixed Integer
Programming [11.712073757744452]
We introduce a mixed integer program (MIP) for assigning importance scores to each neuron in deep neural network architectures.
We drive the solver to minimize the number of critical neurons (i.e., with high importance score) that need to be kept for maintaining the overall accuracy of the trained neural network.
arXiv Detail & Related papers (2020-02-17T21:32:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.