Generative Adversarial Data Programming
- URL: http://arxiv.org/abs/2005.00364v1
- Date: Thu, 30 Apr 2020 07:06:44 GMT
- Title: Generative Adversarial Data Programming
- Authors: Arghya Pal, Vineeth N Balasubramanian
- Abstract summary: We show how distant supervision signals in the form of labeling functions can be used to obtain labels for given data in near-constant time.
This framework is extended to different setups, including self-supervised labeled image generation, zero-shot text to labeled image generation, transfer learning, and multi-task learning.
- Score: 32.2164057862111
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The paucity of large curated hand-labeled training data forms a major
bottleneck in the deployment of machine learning models in computer vision and
other fields. Recent work (Data Programming) has shown how distant supervision
signals in the form of labeling functions can be used to obtain labels for
given data in near-constant time. In this work, we present Adversarial Data
Programming (ADP), which presents an adversarial methodology to generate data
as well as a curated aggregated label, given a set of weak labeling functions.
More interestingly, such labeling functions are often easily generalizable,
thus allowing our framework to be extended to different setups, including
self-supervised labeled image generation, zero-shot text to labeled image
generation, transfer learning, and multi-task learning.
Related papers
- INSITE: labelling medical images using submodular functions and
semi-supervised data programming [19.88996560236578]
Large amounts of labeled data to train deep models creates an implementation bottleneck in resource-constrained settings.
We apply informed subset selection to identify a small number of most representative or diverse images from a huge pool of unlabelled data.
The newly annotated images are then used as exemplars to develop several data programming-driven labeling functions.
arXiv Detail & Related papers (2024-02-11T12:02:00Z) - AutoWS: Automated Weak Supervision Framework for Text Classification [1.748907524043535]
We propose a novel framework for increasing the efficiency of weak supervision process while decreasing the dependency on domain experts.
Our method requires a small set of labeled examples per label class and automatically creates a set of labeling functions to assign noisy labels to numerous unlabeled data.
arXiv Detail & Related papers (2023-02-07T07:12:05Z) - Multi-Task Self-Training for Learning General Representations [97.01728635294879]
Multi-task self-training (MuST) harnesses the knowledge in independent specialized teacher models to train a single general student model.
MuST is scalable with unlabeled or partially labeled datasets and outperforms both specialized supervised models and self-supervised models when training on large scale datasets.
arXiv Detail & Related papers (2021-08-25T17:20:50Z) - TagRuler: Interactive Tool for Span-Level Data Programming by
Demonstration [1.4050836886292872]
Data programming was only accessible to users who knew how to program.
We build a novel tool, TagRuler, that makes it easy for annotators to build span-level labeling functions without programming.
arXiv Detail & Related papers (2021-06-24T04:49:42Z) - Streaming Self-Training via Domain-Agnostic Unlabeled Images [62.57647373581592]
We present streaming self-training (SST) that aims to democratize the process of learning visual recognition models.
Key to SST are two crucial observations: (1) domain-agnostic unlabeled images enable us to learn better models with a few labeled examples without any additional knowledge or supervision; and (2) learning is a continuous process and can be done by constructing a schedule of learning updates.
arXiv Detail & Related papers (2021-04-07T17:58:39Z) - Visual Distant Supervision for Scene Graph Generation [66.10579690929623]
Scene graph models usually require supervised learning on large quantities of labeled data with intensive human annotation.
We propose visual distant supervision, a novel paradigm of visual relation learning, which can train scene graph models without any human-labeled data.
Comprehensive experimental results show that our distantly supervised model outperforms strong weakly supervised and semi-supervised baselines.
arXiv Detail & Related papers (2021-03-29T06:35:24Z) - Adaptive Self-training for Few-shot Neural Sequence Labeling [55.43109437200101]
We develop techniques to address the label scarcity challenge for neural sequence labeling models.
Self-training serves as an effective mechanism to learn from large amounts of unlabeled data.
meta-learning helps in adaptive sample re-weighting to mitigate error propagation from noisy pseudo-labels.
arXiv Detail & Related papers (2020-10-07T22:29:05Z) - Knowledge-Guided Multi-Label Few-Shot Learning for General Image
Recognition [75.44233392355711]
KGGR framework exploits prior knowledge of statistical label correlations with deep neural networks.
It first builds a structured knowledge graph to correlate different labels based on statistical label co-occurrence.
Then, it introduces the label semantics to guide learning semantic-specific features.
It exploits a graph propagation network to explore graph node interactions.
arXiv Detail & Related papers (2020-09-20T15:05:29Z) - Data Programming by Demonstration: A Framework for Interactively
Learning Labeling Functions [2.338938629983582]
We propose a new framework, data programming by demonstration (DPBD), to generate labeling rules using interactive demonstrations of users.
DPBD aims to relieve the burden of writing labeling functions from users, enabling them to focus on higher-level semantics.
We operationalize our framework with Ruler, an interactive system that synthesizes labeling rules for document classification by using span-level annotations of users on document examples.
arXiv Detail & Related papers (2020-09-03T04:25:08Z) - Adversarial Knowledge Transfer from Unlabeled Data [62.97253639100014]
We present a novel Adversarial Knowledge Transfer framework for transferring knowledge from internet-scale unlabeled data to improve the performance of a classifier.
An important novel aspect of our method is that the unlabeled source data can be of different classes from those of the labeled target data, and there is no need to define a separate pretext task.
arXiv Detail & Related papers (2020-08-13T08:04:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.