Related papers: TagRuler: Interactive Tool for Span-Level Data Programming by Demonstration

TagRuler: Interactive Tool for Span-Level Data Programming by Demonstration

URL: http://arxiv.org/abs/2106.12767v1
Date: Thu, 24 Jun 2021 04:49:42 GMT
Title: TagRuler: Interactive Tool for Span-Level Data Programming by Demonstration
Authors: Dongjin Choi and Sara Evensen and \c{C}a\u{g}atay Demiralp and Estevam Hruschka
Abstract summary: Data programming was only accessible to users who knew how to program. We build a novel tool, TagRuler, that makes it easy for annotators to build span-level labeling functions without programming.
Score: 1.4050836886292872
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Despite rapid developments in the field of machine learning research, collecting high-quality labels for supervised learning remains a bottleneck for many applications. This difficulty is exacerbated by the fact that state-of-the-art models for NLP tasks are becoming deeper and more complex, often increasing the amount of training data required even for fine-tuning. Weak supervision methods, including data programming, address this problem and reduce the cost of label collection by using noisy label sources for supervision. However, until recently, data programming was only accessible to users who knew how to program. To bridge this gap, the Data Programming by Demonstration framework was proposed to facilitate the automatic creation of labeling functions based on a few examples labeled by a domain expert. This framework has proven successful for generating high-accuracy labeling models for document classification. In this work, we extend the DPBD framework to span-level annotation tasks, arguably one of the most time-consuming NLP labeling tasks. We built a novel tool, TagRuler, that makes it easy for annotators to build span-level labeling functions without programming and encourages them to explore trade-offs between different labeling models and active learning strategies. We empirically demonstrated that an annotator could achieve a higher F1 score using the proposed tool compared to manual labeling for different span-level annotation tasks.

Related papers

A Benchmark Generative Probabilistic Model for Weak Supervised Learning [2.0257616108612373]
Weak Supervised Learning approaches have been developed to alleviate the annotation burden. We show that latent variable models (PLVMs) achieve state-of-the-art performance across four datasets.
arXiv Detail & Related papers (2023-03-31T07:06:24Z)
AutoWS: Automated Weak Supervision Framework for Text Classification [1.748907524043535]
We propose a novel framework for increasing the efficiency of weak supervision process while decreasing the dependency on domain experts. Our method requires a small set of labeled examples per label class and automatically creates a set of labeling functions to assign noisy labels to numerous unlabeled data.
arXiv Detail & Related papers (2023-02-07T07:12:05Z)
MEGAnno: Exploratory Labeling for NLP in Computational Notebooks [9.462926987075122]
We present MEGAnno, a novel annotation framework designed for NLP practitioners and researchers. With MEGAnno, users can explore data through sophisticated search and interactive suggestion functions. We demonstrate MEGAnno's flexible, exploratory, efficient, and seamless labeling experience through a sentiment analysis use case.
arXiv Detail & Related papers (2023-01-08T19:16:22Z)
A Weakly Supervised Learning Framework for Salient Object Detection via Hybrid Labels [96.56299163691979]
This paper focuses on a new weakly-supervised salient object detection (SOD) task under hybrid labels. To address the issues of label noise and quantity imbalance in this task, we design a new pipeline framework with three sophisticated training strategies. Experiments on five SOD benchmarks show that our method achieves competitive performance against weakly-supervised/unsupervised methods.
arXiv Detail & Related papers (2022-09-07T06:45:39Z)
MONAI Label: A framework for AI-assisted Interactive Labeling of 3D Medical Images [49.664220687980006]
The lack of annotated datasets is a major bottleneck for training new task-specific supervised machine learning models. We present MONAI Label, a free and open-source framework that facilitates the development of applications based on artificial intelligence (AI) models.
arXiv Detail & Related papers (2022-03-23T12:33:11Z)
Generate, Annotate, and Learn: Generative Models Advance Self-Training and Knowledge Distillation [58.64720318755764]
Semi-Supervised Learning (SSL) has seen success in many application domains, but this success often hinges on the availability of task-specific unlabeled data. Knowledge distillation (KD) has enabled compressing deep networks and ensembles, achieving the best results when distilling knowledge on fresh task-specific unlabeled examples. We present a general framework called "generate, annotate, and learn (GAL)" that uses unconditional generative models to synthesize in-domain unlabeled data.
arXiv Detail & Related papers (2021-06-11T05:01:24Z)
SLADE: A Self-Training Framework For Distance Metric Learning [75.54078592084217]
We present a self-training framework, SLADE, to improve retrieval performance by leveraging additional unlabeled data. We first train a teacher model on the labeled data and use it to generate pseudo labels for the unlabeled data. We then train a student model on both labels and pseudo labels to generate final feature embeddings.
arXiv Detail & Related papers (2020-11-20T08:26:10Z)
Adaptive Self-training for Few-shot Neural Sequence Labeling [55.43109437200101]
We develop techniques to address the label scarcity challenge for neural sequence labeling models. Self-training serves as an effective mechanism to learn from large amounts of unlabeled data. meta-learning helps in adaptive sample re-weighting to mitigate error propagation from noisy pseudo-labels.
arXiv Detail & Related papers (2020-10-07T22:29:05Z)
Data Programming by Demonstration: A Framework for Interactively Learning Labeling Functions [2.338938629983582]
We propose a new framework, data programming by demonstration (DPBD), to generate labeling rules using interactive demonstrations of users. DPBD aims to relieve the burden of writing labeling functions from users, enabling them to focus on higher-level semantics. We operationalize our framework with Ruler, an interactive system that synthesizes labeling rules for document classification by using span-level annotations of users on document examples.
arXiv Detail & Related papers (2020-09-03T04:25:08Z)
UniT: Unified Knowledge Transfer for Any-shot Object Detection and Segmentation [52.487469544343305]
Methods for object detection and segmentation rely on large scale instance-level annotations for training. We propose an intuitive and unified semi-supervised model that is applicable to a range of supervision.
arXiv Detail & Related papers (2020-06-12T22:45:47Z)
Generative Adversarial Data Programming [32.2164057862111]
We show how distant supervision signals in the form of labeling functions can be used to obtain labels for given data in near-constant time. This framework is extended to different setups, including self-supervised labeled image generation, zero-shot text to labeled image generation, transfer learning, and multi-task learning.
arXiv Detail & Related papers (2020-04-30T07:06:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.