Recursive Neural Programs: Variational Learning of Image Grammars and
Part-Whole Hierarchies
- URL: http://arxiv.org/abs/2206.08462v1
- Date: Thu, 16 Jun 2022 22:02:06 GMT
- Title: Recursive Neural Programs: Variational Learning of Image Grammars and
Part-Whole Hierarchies
- Authors: Ares Fisher, Rajesh P.N. Rao
- Abstract summary: We introduce Recursive Neural Programs (RNPs) to address the part-whole hierarchy learning problem.
RNPs are the first neural generative model to address the part-whole hierarchy learning problem.
Our results show that RNPs provide an intuitive and explainable way of composing objects and scenes.
- Score: 1.5990720051907859
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Human vision involves parsing and representing objects and scenes using
structured representations based on part-whole hierarchies. Computer vision and
machine learning researchers have recently sought to emulate this capability
using capsule networks, reference frames and active predictive coding, but a
generative model formulation has been lacking. We introduce Recursive Neural
Programs (RNPs), which, to our knowledge, is the first neural generative model
to address the part-whole hierarchy learning problem. RNPs model images as
hierarchical trees of probabilistic sensory-motor programs that recursively
reuse learned sensory-motor primitives to model an image within different
reference frames, forming recursive image grammars. We express RNPs as
structured variational autoencoders (sVAEs) for inference and sampling, and
demonstrate parts-based parsing, sampling and one-shot transfer learning for
MNIST, Omniglot and Fashion-MNIST datasets, demonstrating the model's
expressive power. Our results show that RNPs provide an intuitive and
explainable way of composing objects and scenes, allowing rich compositionality
and intuitive interpretations of objects in terms of part-whole hierarchies.
Related papers
- Learning Object-Centric Representation via Reverse Hierarchy Guidance [73.05170419085796]
Object-Centric Learning (OCL) seeks to enable Neural Networks to identify individual objects in visual scenes.
RHGNet introduces a top-down pathway that works in different ways in the training and inference processes.
Our model achieves SOTA performance on several commonly used datasets.
arXiv Detail & Related papers (2024-05-17T07:48:27Z) - Neural Clustering based Visual Representation Learning [61.72646814537163]
Clustering is one of the most classic approaches in machine learning and data analysis.
We propose feature extraction with clustering (FEC), which views feature extraction as a process of selecting representatives from data.
FEC alternates between grouping pixels into individual clusters to abstract representatives and updating the deep features of pixels with current representatives.
arXiv Detail & Related papers (2024-03-26T06:04:50Z) - Hierarchical Text-to-Vision Self Supervised Alignment for Improved Histopathology Representation Learning [64.1316997189396]
We present a novel language-tied self-supervised learning framework, Hierarchical Language-tied Self-Supervision (HLSS) for histopathology images.
Our resulting model achieves state-of-the-art performance on two medical imaging benchmarks, OpenSRH and TCGA datasets.
arXiv Detail & Related papers (2024-03-21T17:58:56Z) - Neural Language of Thought Models [18.930227757853313]
We introduce the Neural Language of Thought Model (NLoTM), a novel approach for unsupervised learning of LoTH-inspired representation and generation.
NLoTM comprises two key components: (1) the Semantic Vector-Quantized Variational Autoencoder, which learns hierarchical, composable discrete representations aligned with objects and their properties, and (2) the Autoregressive LoT Prior, an autoregressive transformer that learns to generate semantic concept tokens compositionally.
We evaluate NLoTM on several 2D and 3D image datasets, demonstrating superior performance in downstream tasks, out-of-distribution generalization, and image generation
arXiv Detail & Related papers (2024-02-02T08:13:18Z) - OC-NMN: Object-centric Compositional Neural Module Network for
Generative Visual Analogical Reasoning [49.12350554270196]
We show how modularity can be leveraged to derive a compositional data augmentation framework inspired by imagination.
Our method, denoted Object-centric Compositional Neural Module Network (OC-NMN), decomposes visual generative reasoning tasks into a series of primitives applied to objects without using a domain-specific language.
arXiv Detail & Related papers (2023-10-28T20:12:58Z) - On the Transition from Neural Representation to Symbolic Knowledge [2.2528422603742304]
We propose a Neural-Symbolic Transitional Dictionary Learning (TDL) framework that employs an EM algorithm to learn a transitional representation of data.
We implement the framework with a diffusion model by regarding the decomposition of input as a cooperative game.
We additionally use RL enabled by the Markovian of diffusion models to further tune the learned prototypes.
arXiv Detail & Related papers (2023-08-03T19:29:35Z) - Neural-Symbolic Recursive Machine for Systematic Generalization [113.22455566135757]
We introduce the Neural-Symbolic Recursive Machine (NSR), whose core is a Grounded Symbol System (GSS)
NSR integrates neural perception, syntactic parsing, and semantic reasoning.
We evaluate NSR's efficacy across four challenging benchmarks designed to probe systematic generalization capabilities.
arXiv Detail & Related papers (2022-10-04T13:27:38Z) - Active Predictive Coding Networks: A Neural Solution to the Problem of
Learning Reference Frames and Part-Whole Hierarchies [1.5990720051907859]
We introduce Active Predictive Coding Networks (APCNs)
APCNs are a new class of neural networks that solve a major problem posed by Hinton and others in the fields of artificial intelligence and brain modeling.
We demonstrate that APCNs can (a) learn to parse images into part-whole hierarchies, (b) learn compositional representations, and (c) transfer their knowledge to unseen classes of objects.
arXiv Detail & Related papers (2022-01-14T21:22:48Z) - MOGAN: Morphologic-structure-aware Generative Learning from a Single
Image [59.59698650663925]
Recently proposed generative models complete training based on only one image.
We introduce a MOrphologic-structure-aware Generative Adversarial Network named MOGAN that produces random samples with diverse appearances.
Our approach focuses on internal features including the maintenance of rational structures and variation on appearance.
arXiv Detail & Related papers (2021-03-04T12:45:23Z) - Comparative evaluation of CNN architectures for Image Caption Generation [1.2183405753834562]
We have evaluated 17 different Convolutional Neural Networks on two popular Image Caption Generation frameworks.
We observe that model complexity of Convolutional Neural Network, as measured by number of parameters, and the accuracy of the model on Object Recognition task does not necessarily co-relate with its efficacy on feature extraction for Image Caption Generation task.
arXiv Detail & Related papers (2021-02-23T05:43:54Z) - Text-to-Image Generation with Attention Based Recurrent Neural Networks [1.2599533416395765]
We develop a tractable and stable caption-based image generation model.
Experimentations are performed on Microsoft datasets.
Results show that the proposed model performs better than contemporary approaches.
arXiv Detail & Related papers (2020-01-18T12:19:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.