OneLabeler: A Flexible System for Building Data Labeling Tools
- URL: http://arxiv.org/abs/2203.14227v1
- Date: Sun, 27 Mar 2022 07:22:36 GMT
- Title: OneLabeler: A Flexible System for Building Data Labeling Tools
- Authors: Yu Zhang, Yun Wang, Haidong Zhang, Bin Zhu, Siming Chen, Dongmei Zhang
- Abstract summary: OneLabeler supports configuration and composition of common software modules to build data labeling tools.
A user study with developers provides evidence that OneLabeler supports efficient building of diverse data labeling tools.
- Score: 48.15772261649084
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Labeled datasets are essential for supervised machine learning. Various data
labeling tools have been built to collect labels in different usage scenarios.
However, developing labeling tools is time-consuming, costly, and
expertise-demanding on software development. In this paper, we propose a
conceptual framework for data labeling and OneLabeler based on the conceptual
framework to support easy building of labeling tools for diverse usage
scenarios. The framework consists of common modules and states in labeling
tools summarized through coding of existing tools. OneLabeler supports
configuration and composition of common software modules through visual
programming to build data labeling tools. A module can be a human, machine, or
mixed computation procedure in data labeling. We demonstrate the expressiveness
and utility of the system through ten example labeling tools built with
OneLabeler. A user study with developers provides evidence that OneLabeler
supports efficient building of diverse data labeling tools.
Related papers
- Thinking Like an Annotator: Generation of Dataset Labeling Instructions [59.603239753484345]
We introduce a new task, Labeling Instruction Generation, to address missing publicly available labeling instructions.
We take a reasonably annotated dataset and: 1) generate a set of examples that are visually representative of each category in the dataset; 2) provide a text label that corresponds to each of the examples.
This framework acts as a proxy to human annotators that can help to both generate a final labeling instruction set and evaluate its quality.
arXiv Detail & Related papers (2023-06-24T18:32:48Z) - Imprecise Label Learning: A Unified Framework for Learning with Various Imprecise Label Configurations [91.67511167969934]
imprecise label learning (ILL) is a framework for the unification of learning with various imprecise label configurations.
We demonstrate that ILL can seamlessly adapt to partial label learning, semi-supervised learning, noisy label learning, and, more importantly, a mixture of these settings.
arXiv Detail & Related papers (2023-05-22T04:50:28Z) - AutoWS: Automated Weak Supervision Framework for Text Classification [1.748907524043535]
We propose a novel framework for increasing the efficiency of weak supervision process while decreasing the dependency on domain experts.
Our method requires a small set of labeled examples per label class and automatically creates a set of labeling functions to assign noisy labels to numerous unlabeled data.
arXiv Detail & Related papers (2023-02-07T07:12:05Z) - SciAnnotate: A Tool for Integrating Weak Labeling Sources for Sequence
Labeling [55.71459234749639]
SciAnnotate is a web-based tool for text annotation called SciAnnotate, which stands for scientific annotation tool.
Our tool provides users with multiple user-friendly interfaces for creating weak labels.
In this study, we take multi-source weak label denoising as an example, we utilized a Bertifying Conditional Hidden Markov Model to denoise the weak label generated by our tool.
arXiv Detail & Related papers (2022-08-07T19:18:13Z) - TagRuler: Interactive Tool for Span-Level Data Programming by
Demonstration [1.4050836886292872]
Data programming was only accessible to users who knew how to program.
We build a novel tool, TagRuler, that makes it easy for annotators to build span-level labeling functions without programming.
arXiv Detail & Related papers (2021-06-24T04:49:42Z) - A Study on the Autoregressive and non-Autoregressive Multi-label
Learning [77.11075863067131]
We propose a self-attention based variational encoder-model to extract the label-label and label-feature dependencies jointly.
Our model can therefore be used to predict all labels in parallel while still including both label-label and label-feature dependencies.
arXiv Detail & Related papers (2020-12-03T05:41:44Z) - Data Programming by Demonstration: A Framework for Interactively
Learning Labeling Functions [2.338938629983582]
We propose a new framework, data programming by demonstration (DPBD), to generate labeling rules using interactive demonstrations of users.
DPBD aims to relieve the burden of writing labeling functions from users, enabling them to focus on higher-level semantics.
We operationalize our framework with Ruler, an interactive system that synthesizes labeling rules for document classification by using span-level annotations of users on document examples.
arXiv Detail & Related papers (2020-09-03T04:25:08Z) - Adversarial Knowledge Transfer from Unlabeled Data [62.97253639100014]
We present a novel Adversarial Knowledge Transfer framework for transferring knowledge from internet-scale unlabeled data to improve the performance of a classifier.
An important novel aspect of our method is that the unlabeled source data can be of different classes from those of the labeled target data, and there is no need to define a separate pretext task.
arXiv Detail & Related papers (2020-08-13T08:04:27Z) - Embeddings of Label Components for Sequence Labeling: A Case Study of
Fine-grained Named Entity Recognition [41.60109880213463]
We propose to integrate label component information as embeddings into models.
We demonstrate that the proposed method improves performance, especially for instances with low-frequency labels.
arXiv Detail & Related papers (2020-06-02T03:47:19Z) - Generative Adversarial Data Programming [32.2164057862111]
We show how distant supervision signals in the form of labeling functions can be used to obtain labels for given data in near-constant time.
This framework is extended to different setups, including self-supervised labeled image generation, zero-shot text to labeled image generation, transfer learning, and multi-task learning.
arXiv Detail & Related papers (2020-04-30T07:06:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.