Related papers: Data Programming by Demonstration: A Framework for Interactively Learning Labeling Functions

Data Programming by Demonstration: A Framework for Interactively Learning Labeling Functions

URL: http://arxiv.org/abs/2009.01444v3
Date: Tue, 15 Sep 2020 22:44:04 GMT
Title: Data Programming by Demonstration: A Framework for Interactively Learning Labeling Functions
Authors: Sara Evensen and Chang Ge and Dongjin Choi and \c{C}a\u{g}atay Demiralp
Abstract summary: We propose a new framework, data programming by demonstration (DPBD), to generate labeling rules using interactive demonstrations of users. DPBD aims to relieve the burden of writing labeling functions from users, enabling them to focus on higher-level semantics. We operationalize our framework with Ruler, an interactive system that synthesizes labeling rules for document classification by using span-level annotations of users on document examples.
Score: 2.338938629983582
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Data programming is a programmatic weak supervision approach to efficiently curate large-scale labeled training data. Writing data programs (labeling functions) requires, however, both programming literacy and domain expertise. Many subject matter experts have neither programming proficiency nor time to effectively write data programs. Furthermore, regardless of one's expertise in coding or machine learning, transferring domain expertise into labeling functions by enumerating rules and thresholds is not only time consuming but also inherently difficult. Here we propose a new framework, data programming by demonstration (DPBD), to generate labeling rules using interactive demonstrations of users. DPBD aims to relieve the burden of writing labeling functions from users, enabling them to focus on higher-level semantics such as identifying relevant signals for labeling tasks. We operationalize our framework with Ruler, an interactive system that synthesizes labeling rules for document classification by using span-level annotations of users on document examples. We compare Ruler with conventional data programming through a user study conducted with 10 data scientists creating labeling functions for sentiment and spam classification tasks. We find that Ruler is easier to use and learn and offers higher overall satisfaction, while providing discriminative model performances comparable to ones achieved by conventional data programming.

Related papers

Multi-Label Contrastive Learning : A Comprehensive Study [48.81069245141415]
Multi-label classification has emerged as a key area in both research and industry. Applying contrastive learning to multi-label classification presents unique challenges. We conduct an in-depth study of contrastive learning loss for multi-label classification across diverse settings.
arXiv Detail & Related papers (2024-11-27T20:20:06Z)
Maximizing Data Efficiency for Cross-Lingual TTS Adaptation by Self-Supervised Representation Mixing and Embedding Initialization [57.38123229553157]
This paper presents an effective transfer learning framework for language adaptation in text-to-speech systems. We focus on achieving language adaptation using minimal labeled and unlabeled data. Experimental results show that our framework is able to synthesize intelligible speech in unseen languages with only 4 utterances of labeled data and 15 minutes of unlabeled data.
arXiv Detail & Related papers (2024-01-23T21:55:34Z)
Description-Enhanced Label Embedding Contrastive Learning for Text Classification [65.01077813330559]
Self-Supervised Learning (SSL) in model learning process and design a novel self-supervised Relation of Relation (R2) classification task. Relation of Relation Learning Network (R2-Net) for text classification, in which text classification and R2 classification are treated as optimization targets. external knowledge from WordNet to obtain multi-aspect descriptions for label semantic learning.
arXiv Detail & Related papers (2023-06-15T02:19:34Z)
Exploring Structured Semantic Prior for Multi Label Recognition with Incomplete Labels [60.675714333081466]
Multi-label recognition (MLR) with incomplete labels is very challenging. Recent works strive to explore the image-to-label correspondence in the vision-language model, ie, CLIP, to compensate for insufficient annotations. We advocate remedying the deficiency of label supervision for the MLR with incomplete labels by deriving a structured semantic prior.
arXiv Detail & Related papers (2023-03-23T12:39:20Z)
AutoWS: Automated Weak Supervision Framework for Text Classification [1.748907524043535]
We propose a novel framework for increasing the efficiency of weak supervision process while decreasing the dependency on domain experts. Our method requires a small set of labeled examples per label class and automatically creates a set of labeling functions to assign noisy labels to numerous unlabeled data.
arXiv Detail & Related papers (2023-02-07T07:12:05Z)
Learning Instructions with Unlabeled Data for Zero-Shot Cross-Task Generalization [68.91386402390403]
We propose Unlabeled Data Augmented Instruction Tuning (UDIT) to take better advantage of the instructions during instruction learning. We conduct extensive experiments to show UDIT's effectiveness in various scenarios of tasks and datasets.
arXiv Detail & Related papers (2022-10-17T15:25:24Z)
I Know Therefore I Score: Label-Free Crafting of Scoring Functions using Constraints Based on Domain Expertise [6.26476800426345]
We introduce a label-free practical approach to learn a scoring function from multi-dimensional numerical data. The approach incorporates insights and business rules from domain experts in the form of easily observable and specifiable constraints. We convert such constraints into loss functions that are optimized simultaneously while learning the scoring function.
arXiv Detail & Related papers (2022-03-18T17:51:20Z)
TagRuler: Interactive Tool for Span-Level Data Programming by Demonstration [1.4050836886292872]
Data programming was only accessible to users who knew how to program. We build a novel tool, TagRuler, that makes it easy for annotators to build span-level labeling functions without programming.
arXiv Detail & Related papers (2021-06-24T04:49:42Z)
Adaptive Self-training for Few-shot Neural Sequence Labeling [55.43109437200101]
We develop techniques to address the label scarcity challenge for neural sequence labeling models. Self-training serves as an effective mechanism to learn from large amounts of unlabeled data. meta-learning helps in adaptive sample re-weighting to mitigate error propagation from noisy pseudo-labels.
arXiv Detail & Related papers (2020-10-07T22:29:05Z)
Adversarial Knowledge Transfer from Unlabeled Data [62.97253639100014]
We present a novel Adversarial Knowledge Transfer framework for transferring knowledge from internet-scale unlabeled data to improve the performance of a classifier. An important novel aspect of our method is that the unlabeled source data can be of different classes from those of the labeled target data, and there is no need to define a separate pretext task.
arXiv Detail & Related papers (2020-08-13T08:04:27Z)
Generative Adversarial Data Programming [32.2164057862111]
We show how distant supervision signals in the form of labeling functions can be used to obtain labels for given data in near-constant time. This framework is extended to different setups, including self-supervised labeled image generation, zero-shot text to labeled image generation, transfer learning, and multi-task learning.
arXiv Detail & Related papers (2020-04-30T07:06:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.