Drawing out of Distribution with Neuro-Symbolic Generative Models
- URL: http://arxiv.org/abs/2206.01829v1
- Date: Fri, 3 Jun 2022 21:40:22 GMT
- Title: Drawing out of Distribution with Neuro-Symbolic Generative Models
- Authors: Yichao Liang, Joshua B. Tenenbaum, Tuan Anh Le, N. Siddharth
- Abstract summary: Drawing out of Distribution is a neuro-symbolic generative model of stroke-based drawing.
DooD operates directly on images, requires no supervision or expensive test-time inference.
We evaluate DooD on its ability to generalise across both data and tasks.
- Score: 49.79371715591122
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning general-purpose representations from perceptual inputs is a hallmark
of human intelligence. For example, people can write out numbers or characters,
or even draw doodles, by characterizing these tasks as different instantiations
of the same generic underlying process -- compositional arrangements of
different forms of pen strokes. Crucially, learning to do one task, say
writing, implies reasonable competence at another, say drawing, on account of
this shared process. We present Drawing out of Distribution (DooD), a
neuro-symbolic generative model of stroke-based drawing that can learn such
general-purpose representations. In contrast to prior work, DooD operates
directly on images, requires no supervision or expensive test-time inference,
and performs unsupervised amortised inference with a symbolic stroke model that
better enables both interpretability and generalization. We evaluate DooD on
its ability to generalise across both data and tasks. We first perform
zero-shot transfer from one dataset (e.g. MNIST) to another (e.g. Quickdraw),
across five different datasets, and show that DooD clearly outperforms
different baselines. An analysis of the learnt representations further
highlights the benefits of adopting a symbolic stroke model. We then adopt a
subset of the Omniglot challenge tasks, and evaluate its ability to generate
new exemplars (both unconditionally and conditionally), and perform one-shot
classification, showing that DooD matches the state of the art. Taken together,
we demonstrate that DooD does indeed capture general-purpose representations
across both data and task, and takes a further step towards building general
and robust concept-learning systems.
Related papers
- Towards a Generalist and Blind RGB-X Tracker [91.36268768952755]
We develop a single model tracker that can remain blind to any modality X during inference time.
Our training process is extremely simple, integrating multi-label classification loss with a routing function.
Our generalist and blind tracker can achieve competitive performance compared to well-established modal-specific models.
arXiv Detail & Related papers (2024-05-28T03:00:58Z) - One for All: Towards Training One Graph Model for All Classification Tasks [61.656962278497225]
A unified model for various graph tasks remains underexplored, primarily due to the challenges unique to the graph learning domain.
We propose textbfOne for All (OFA), the first general framework that can use a single graph model to address the above challenges.
OFA performs well across different tasks, making it the first general-purpose across-domains classification model on graphs.
arXiv Detail & Related papers (2023-09-29T21:15:26Z) - SignBERT+: Hand-model-aware Self-supervised Pre-training for Sign
Language Understanding [132.78015553111234]
Hand gesture serves as a crucial role during the expression of sign language.
Current deep learning based methods for sign language understanding (SLU) are prone to over-fitting due to insufficient sign data resource.
We propose the first self-supervised pre-trainable SignBERT+ framework with model-aware hand prior incorporated.
arXiv Detail & Related papers (2023-05-08T17:16:38Z) - Learning Transferable Pedestrian Representation from Multimodal
Information Supervision [174.5150760804929]
VAL-PAT is a novel framework that learns transferable representations to enhance various pedestrian analysis tasks with multimodal information.
We first perform pre-training on LUPerson-TA dataset, where each image contains text and attribute annotations.
We then transfer the learned representations to various downstream tasks, including person reID, person attribute recognition and text-based person search.
arXiv Detail & Related papers (2023-04-12T01:20:58Z) - The Trade-off between Universality and Label Efficiency of
Representations from Contrastive Learning [32.15608637930748]
We show that there exists a trade-off between the two desiderata so that one may not be able to achieve both simultaneously.
We provide analysis using a theoretical data model and show that, while more diverse pre-training data result in more diverse features for different tasks, it puts less emphasis on task-specific features.
arXiv Detail & Related papers (2023-02-28T22:14:33Z) - Self-Supervised Visual Representation Learning Using Lightweight
Architectures [0.0]
In self-supervised learning, a model is trained to solve a pretext task, using a data set whose annotations are created by a machine.
We critically examine the most notable pretext tasks to extract features from image data.
We study the performance of various self-supervised techniques keeping all other parameters uniform.
arXiv Detail & Related papers (2021-10-21T14:13:10Z) - Learning View-Disentangled Human Pose Representation by Contrastive
Cross-View Mutual Information Maximization [33.36330493757669]
We introduce a novel representation learning method to disentangle pose-dependent as well as view-dependent factors from 2D human poses.
The method trains a network using cross-view mutual information (CV-MIM) which maximizes mutual information of the same pose performed from different viewpoints.
CV-MIM outperforms other competing methods by a large margin in the single-shot cross-view setting.
arXiv Detail & Related papers (2020-12-02T18:55:35Z) - Two-Level Adversarial Visual-Semantic Coupling for Generalized Zero-shot
Learning [21.89909688056478]
We propose a new two-level joint idea to augment the generative network with an inference network during training.
This provides strong cross-modal interaction for effective transfer of knowledge between visual and semantic domains.
We evaluate our approach on four benchmark datasets against several state-of-the-art methods, and show its performance.
arXiv Detail & Related papers (2020-07-15T15:34:09Z) - Learning Task-General Representations with Generative Neuro-Symbolic
Modeling [22.336243882030026]
We develop a generative neuro-symbolic (GNS) model of handwritten character concepts.
The correlations between parts are modeled with neural network subroutines, allowing the model to learn directly from raw data.
In a subsequent evaluation, our GNS model uses probabilistic inference to learn rich conceptual representations from a single training image.
arXiv Detail & Related papers (2020-06-25T14:41:27Z) - Learning What Makes a Difference from Counterfactual Examples and
Gradient Supervision [57.14468881854616]
We propose an auxiliary training objective that improves the generalization capabilities of neural networks.
We use pairs of minimally-different examples with different labels, a.k.a counterfactual or contrasting examples, which provide a signal indicative of the underlying causal structure of the task.
Models trained with this technique demonstrate improved performance on out-of-distribution test sets.
arXiv Detail & Related papers (2020-04-20T02:47:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.