OC-NMN: Object-centric Compositional Neural Module Network for
Generative Visual Analogical Reasoning
- URL: http://arxiv.org/abs/2310.18807v1
- Date: Sat, 28 Oct 2023 20:12:58 GMT
- Title: OC-NMN: Object-centric Compositional Neural Module Network for
Generative Visual Analogical Reasoning
- Authors: Rim Assouel, Pau Rodriguez, Perouz Taslakian, David Vazquez, Yoshua
Bengio
- Abstract summary: We show how modularity can be leveraged to derive a compositional data augmentation framework inspired by imagination.
Our method, denoted Object-centric Compositional Neural Module Network (OC-NMN), decomposes visual generative reasoning tasks into a series of primitives applied to objects without using a domain-specific language.
- Score: 49.12350554270196
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A key aspect of human intelligence is the ability to imagine -- composing
learned concepts in novel ways -- to make sense of new scenarios. Such capacity
is not yet attained for machine learning systems. In this work, in the context
of visual reasoning, we show how modularity can be leveraged to derive a
compositional data augmentation framework inspired by imagination. Our method,
denoted Object-centric Compositional Neural Module Network (OC-NMN), decomposes
visual generative reasoning tasks into a series of primitives applied to
objects without using a domain-specific language. We show that our modular
architectural choices can be used to generate new training tasks that lead to
better out-of-distribution generalization. We compare our model to existing and
new baselines in proposed visual reasoning benchmark that consists of applying
arithmetic operations to MNIST digits.
Related papers
- Sketch-Plan-Generalize: Continual Few-Shot Learning of Inductively Generalizable Spatial Concepts [6.932008652560561]
We seek a learning architecture that infers a succinct $program$ representation that explains the observed instance.
Our approach combines the benefits of the code generation ability of large language models along with grounded neural representations.
arXiv Detail & Related papers (2024-04-11T14:09:41Z) - Seeing is Believing: Brain-Inspired Modular Training for Mechanistic
Interpretability [5.15188009671301]
Brain-Inspired Modular Training is a method for making neural networks more modular and interpretable.
BIMT embeds neurons in a geometric space and augments the loss function with a cost proportional to the length of each neuron connection.
arXiv Detail & Related papers (2023-05-04T17:56:42Z) - Recursive Neural Programs: Variational Learning of Image Grammars and
Part-Whole Hierarchies [1.5990720051907859]
We introduce Recursive Neural Programs (RNPs) to address the part-whole hierarchy learning problem.
RNPs are the first neural generative model to address the part-whole hierarchy learning problem.
Our results show that RNPs provide an intuitive and explainable way of composing objects and scenes.
arXiv Detail & Related papers (2022-06-16T22:02:06Z) - Dynamic Inference with Neural Interpreters [72.90231306252007]
We present Neural Interpreters, an architecture that factorizes inference in a self-attention network as a system of modules.
inputs to the model are routed through a sequence of functions in a way that is end-to-end learned.
We show that Neural Interpreters perform on par with the vision transformer using fewer parameters, while being transferrable to a new task in a sample efficient manner.
arXiv Detail & Related papers (2021-10-12T23:22:45Z) - Object-Centric Representation Learning for Video Question Answering [27.979053252431306]
Video answering (Video QA) presents a powerful testbed for human-like intelligent behaviors.
The task demands new capabilities to integrate processing, language understanding, binding abstract concepts to concrete visual artifacts.
We propose a new query-guided representation framework to turn a video into a relational graph of objects.
arXiv Detail & Related papers (2021-04-12T02:37:20Z) - Understanding the Role of Individual Units in a Deep Neural Network [85.23117441162772]
We present an analytic framework to systematically identify hidden units within image classification and image generation networks.
First, we analyze a convolutional neural network (CNN) trained on scene classification and discover units that match a diverse set of object concepts.
Second, we use a similar analytic method to analyze a generative adversarial network (GAN) model trained to generate scenes.
arXiv Detail & Related papers (2020-09-10T17:59:10Z) - Text Modular Networks: Learning to Decompose Tasks in the Language of
Existing Models [61.480085460269514]
We propose a framework for building interpretable systems that learn to solve complex tasks by decomposing them into simpler ones solvable by existing models.
We use this framework to build ModularQA, a system that can answer multi-hop reasoning questions by decomposing them into sub-questions answerable by a neural factoid single-span QA model and a symbolic calculator.
arXiv Detail & Related papers (2020-09-01T23:45:42Z) - Concept Learners for Few-Shot Learning [76.08585517480807]
We propose COMET, a meta-learning method that improves generalization ability by learning to learn along human-interpretable concept dimensions.
We evaluate our model on few-shot tasks from diverse domains, including fine-grained image classification, document categorization and cell type annotation.
arXiv Detail & Related papers (2020-07-14T22:04:17Z) - Compositional Generalization by Learning Analytical Expressions [87.15737632096378]
A memory-augmented neural model is connected with analytical expressions to achieve compositional generalization.
Experiments on the well-known benchmark SCAN demonstrate that our model seizes a great ability of compositional generalization.
arXiv Detail & Related papers (2020-06-18T15:50:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.