Synbols: Probing Learning Algorithms with Synthetic Datasets
- URL: http://arxiv.org/abs/2009.06415v2
- Date: Wed, 4 Nov 2020 21:57:37 GMT
- Title: Synbols: Probing Learning Algorithms with Synthetic Datasets
- Authors: Alexandre Lacoste, Pau Rodr\'iguez, Fr\'ed\'eric Branchaud-Charron,
Parmida Atighehchian, Massimo Caccia, Issam Laradji, Alexandre Drouin, Matt
Craddock, Laurent Charlin, David V\'azquez
- Abstract summary: Synbols is a tool for rapidly generating new datasets with a rich composition of latent features rendered in low resolution images.
Our tool's high-level interface provides a language for rapidly generating new distributions on the latent features.
To showcase the versatility of Synbols, we use it to dissect the limitations and flaws in standard learning algorithms in various learning setups.
- Score: 112.45883250213272
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Progress in the field of machine learning has been fueled by the introduction
of benchmark datasets pushing the limits of existing algorithms. Enabling the
design of datasets to test specific properties and failure modes of learning
algorithms is thus a problem of high interest, as it has a direct impact on
innovation in the field. In this sense, we introduce Synbols -- Synthetic
Symbols -- a tool for rapidly generating new datasets with a rich composition
of latent features rendered in low resolution images. Synbols leverages the
large amount of symbols available in the Unicode standard and the wide range of
artistic font provided by the open font community. Our tool's high-level
interface provides a language for rapidly generating new distributions on the
latent features, including various types of textures and occlusions. To
showcase the versatility of Synbols, we use it to dissect the limitations and
flaws in standard learning algorithms in various learning setups including
supervised learning, active learning, out of distribution generalization,
unsupervised representation learning, and object counting.
Related papers
- Text2Data: Low-Resource Data Generation with Textual Control [104.38011760992637]
Natural language serves as a common and straightforward control signal for humans to interact seamlessly with machines.
We propose Text2Data, a novel approach that utilizes unlabeled data to understand the underlying data distribution through an unsupervised diffusion model.
It undergoes controllable finetuning via a novel constraint optimization-based learning objective that ensures controllability and effectively counteracts catastrophic forgetting.
arXiv Detail & Related papers (2024-02-08T03:41:39Z) - The Role of Foundation Models in Neuro-Symbolic Learning and Reasoning [54.56905063752427]
Neuro-Symbolic AI (NeSy) holds promise to ensure the safe deployment of AI systems.
Existing pipelines that train the neural and symbolic components sequentially require extensive labelling.
New architecture, NeSyGPT, fine-tunes a vision-language foundation model to extract symbolic features from raw data.
arXiv Detail & Related papers (2024-02-02T20:33:14Z) - Harnessing the Power of Beta Scoring in Deep Active Learning for
Multi-Label Text Classification [6.662167018900634]
Our study introduces a novel deep active learning strategy, capitalizing on the Beta family of proper scoring rules within the Expected Loss Reduction framework.
It computes the expected increase in scores using the Beta Scoring Rules, which are then transformed into sample vector representations.
Comprehensive evaluations across both synthetic and real datasets reveal our method's capability to often outperform established acquisition techniques in multi-label text classification.
arXiv Detail & Related papers (2024-01-15T00:06:24Z) - Homological Convolutional Neural Networks [4.615338063719135]
We propose a novel deep learning architecture that exploits the data structural organization through topologically constrained network representations.
We test our model on 18 benchmark datasets against 5 classic machine learning and 3 deep learning models.
arXiv Detail & Related papers (2023-08-26T08:48:51Z) - Recognizing Unseen Objects via Multimodal Intensive Knowledge Graph
Propagation [68.13453771001522]
We propose a multimodal intensive ZSL framework that matches regions of images with corresponding semantic embeddings.
We conduct extensive experiments and evaluate our model on large-scale real-world data.
arXiv Detail & Related papers (2023-06-14T13:07:48Z) - Symbolic Visual Reinforcement Learning: A Scalable Framework with
Object-Level Abstraction and Differentiable Expression Search [63.3745291252038]
We propose DiffSES, a novel symbolic learning approach that discovers discrete symbolic policies.
By using object-level abstractions instead of raw pixel-level inputs, DiffSES is able to leverage the simplicity and scalability advantages of symbolic expressions.
Our experiments demonstrate that DiffSES is able to generate symbolic policies that are simpler and more scalable than state-of-the-art symbolic RL methods.
arXiv Detail & Related papers (2022-12-30T17:50:54Z) - Universalizing Weak Supervision [18.832796698152492]
We propose a universal technique that enables weak supervision over any label type.
We apply this technique to important problems previously not tackled by WS frameworks including learning to rank, regression, and learning in hyperbolic space.
arXiv Detail & Related papers (2021-12-07T17:59:10Z) - Software Vulnerability Detection via Deep Learning over Disaggregated
Code Graph Representation [57.92972327649165]
This work explores a deep learning approach to automatically learn the insecure patterns from code corpora.
Because code naturally admits graph structures with parsing, we develop a novel graph neural network (GNN) to exploit both the semantic context and structural regularity of a program.
arXiv Detail & Related papers (2021-09-07T21:24:36Z) - Generalized Zero-Shot Learning using Multimodal Variational Auto-Encoder
with Semantic Concepts [0.9054540533394924]
Recent techniques try to learn a cross-modal mapping between the semantic space and the image space.
We propose a Multimodal Variational Auto-Encoder (M-VAE) which can learn the shared latent space of image features and the semantic space.
Our results show that our proposed model outperforms the current state-of-the-art approaches for generalized zero-shot learning.
arXiv Detail & Related papers (2021-06-26T20:08:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.