Data Programming by Demonstration: A Framework for Interactively
Learning Labeling Functions
- URL: http://arxiv.org/abs/2009.01444v3
- Date: Tue, 15 Sep 2020 22:44:04 GMT
- Title: Data Programming by Demonstration: A Framework for Interactively
Learning Labeling Functions
- Authors: Sara Evensen and Chang Ge and Dongjin Choi and \c{C}a\u{g}atay
Demiralp
- Abstract summary: We propose a new framework, data programming by demonstration (DPBD), to generate labeling rules using interactive demonstrations of users.
DPBD aims to relieve the burden of writing labeling functions from users, enabling them to focus on higher-level semantics.
We operationalize our framework with Ruler, an interactive system that synthesizes labeling rules for document classification by using span-level annotations of users on document examples.
- Score: 2.338938629983582
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data programming is a programmatic weak supervision approach to efficiently
curate large-scale labeled training data. Writing data programs (labeling
functions) requires, however, both programming literacy and domain expertise.
Many subject matter experts have neither programming proficiency nor time to
effectively write data programs. Furthermore, regardless of one's expertise in
coding or machine learning, transferring domain expertise into labeling
functions by enumerating rules and thresholds is not only time consuming but
also inherently difficult. Here we propose a new framework, data programming by
demonstration (DPBD), to generate labeling rules using interactive
demonstrations of users. DPBD aims to relieve the burden of writing labeling
functions from users, enabling them to focus on higher-level semantics such as
identifying relevant signals for labeling tasks. We operationalize our
framework with Ruler, an interactive system that synthesizes labeling rules for
document classification by using span-level annotations of users on document
examples. We compare Ruler with conventional data programming through a user
study conducted with 10 data scientists creating labeling functions for
sentiment and spam classification tasks. We find that Ruler is easier to use
and learn and offers higher overall satisfaction, while providing
discriminative model performances comparable to ones achieved by conventional
data programming.
Related papers
- Maximizing Data Efficiency for Cross-Lingual TTS Adaptation by
Self-Supervised Representation Mixing and Embedding Initialization [57.38123229553157]
This paper presents an effective transfer learning framework for language adaptation in text-to-speech systems.
We focus on achieving language adaptation using minimal labeled and unlabeled data.
Experimental results show that our framework is able to synthesize intelligible speech in unseen languages with only 4 utterances of labeled data and 15 minutes of unlabeled data.
arXiv Detail & Related papers (2024-01-23T21:55:34Z) - Description-Enhanced Label Embedding Contrastive Learning for Text
Classification [65.01077813330559]
Self-Supervised Learning (SSL) in model learning process and design a novel self-supervised Relation of Relation (R2) classification task.
Relation of Relation Learning Network (R2-Net) for text classification, in which text classification and R2 classification are treated as optimization targets.
external knowledge from WordNet to obtain multi-aspect descriptions for label semantic learning.
arXiv Detail & Related papers (2023-06-15T02:19:34Z) - Exploring Structured Semantic Prior for Multi Label Recognition with
Incomplete Labels [60.675714333081466]
Multi-label recognition (MLR) with incomplete labels is very challenging.
Recent works strive to explore the image-to-label correspondence in the vision-language model, ie, CLIP, to compensate for insufficient annotations.
We advocate remedying the deficiency of label supervision for the MLR with incomplete labels by deriving a structured semantic prior.
arXiv Detail & Related papers (2023-03-23T12:39:20Z) - AutoWS: Automated Weak Supervision Framework for Text Classification [1.748907524043535]
We propose a novel framework for increasing the efficiency of weak supervision process while decreasing the dependency on domain experts.
Our method requires a small set of labeled examples per label class and automatically creates a set of labeling functions to assign noisy labels to numerous unlabeled data.
arXiv Detail & Related papers (2023-02-07T07:12:05Z) - Learning Instructions with Unlabeled Data for Zero-Shot Cross-Task
Generalization [68.91386402390403]
We propose Unlabeled Data Augmented Instruction Tuning (UDIT) to take better advantage of the instructions during instruction learning.
We conduct extensive experiments to show UDIT's effectiveness in various scenarios of tasks and datasets.
arXiv Detail & Related papers (2022-10-17T15:25:24Z) - I Know Therefore I Score: Label-Free Crafting of Scoring Functions using
Constraints Based on Domain Expertise [6.26476800426345]
We introduce a label-free practical approach to learn a scoring function from multi-dimensional numerical data.
The approach incorporates insights and business rules from domain experts in the form of easily observable and specifiable constraints.
We convert such constraints into loss functions that are optimized simultaneously while learning the scoring function.
arXiv Detail & Related papers (2022-03-18T17:51:20Z) - TagRuler: Interactive Tool for Span-Level Data Programming by
Demonstration [1.4050836886292872]
Data programming was only accessible to users who knew how to program.
We build a novel tool, TagRuler, that makes it easy for annotators to build span-level labeling functions without programming.
arXiv Detail & Related papers (2021-06-24T04:49:42Z) - Adaptive Self-training for Few-shot Neural Sequence Labeling [55.43109437200101]
We develop techniques to address the label scarcity challenge for neural sequence labeling models.
Self-training serves as an effective mechanism to learn from large amounts of unlabeled data.
meta-learning helps in adaptive sample re-weighting to mitigate error propagation from noisy pseudo-labels.
arXiv Detail & Related papers (2020-10-07T22:29:05Z) - Adversarial Knowledge Transfer from Unlabeled Data [62.97253639100014]
We present a novel Adversarial Knowledge Transfer framework for transferring knowledge from internet-scale unlabeled data to improve the performance of a classifier.
An important novel aspect of our method is that the unlabeled source data can be of different classes from those of the labeled target data, and there is no need to define a separate pretext task.
arXiv Detail & Related papers (2020-08-13T08:04:27Z) - Generative Adversarial Data Programming [32.2164057862111]
We show how distant supervision signals in the form of labeling functions can be used to obtain labels for given data in near-constant time.
This framework is extended to different setups, including self-supervised labeled image generation, zero-shot text to labeled image generation, transfer learning, and multi-task learning.
arXiv Detail & Related papers (2020-04-30T07:06:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.