SciAnnotate: A Tool for Integrating Weak Labeling Sources for Sequence
Labeling
- URL: http://arxiv.org/abs/2208.10241v1
- Date: Sun, 7 Aug 2022 19:18:13 GMT
- Title: SciAnnotate: A Tool for Integrating Weak Labeling Sources for Sequence
Labeling
- Authors: Mengyang Liu, Haozheng Luo, Leonard Thong, Yinghao Li, Chao Zhang, Le
Song
- Abstract summary: SciAnnotate is a web-based tool for text annotation called SciAnnotate, which stands for scientific annotation tool.
Our tool provides users with multiple user-friendly interfaces for creating weak labels.
In this study, we take multi-source weak label denoising as an example, we utilized a Bertifying Conditional Hidden Markov Model to denoise the weak label generated by our tool.
- Score: 55.71459234749639
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Weak labeling is a popular weak supervision strategy for Named Entity
Recognition (NER) tasks, with the goal of reducing the necessity for
hand-crafted annotations. Although there are numerous remarkable annotation
tools for NER labeling, the subject of integrating weak labeling sources is
still unexplored. We introduce a web-based tool for text annotation called
SciAnnotate, which stands for scientific annotation tool. Compared to
frequently used text annotation tools, our annotation tool allows for the
development of weak labels in addition to providing a manual annotation
experience. Our tool provides users with multiple user-friendly interfaces for
creating weak labels. SciAnnotate additionally allows users to incorporate
their own language models and visualize the output of their model for
evaluation. In this study, we take multi-source weak label denoising as an
example, we utilized a Bertifying Conditional Hidden Markov Model to denoise
the weak label generated by our tool. We also evaluate our annotation tool
against the dataset provided by Mysore which contains 230 annotated materials
synthesis procedures. The results shows that a 53.7% reduction in annotation
time obtained AND a 1.6\% increase in recall using weak label denoising. Online
demo is available at https://sciannotate.azurewebsites.net/(demo account can be
found in README), but we don't host a model server with it, please check the
README in supplementary material for model server usage.
Related papers
- AutoWS: Automated Weak Supervision Framework for Text Classification [1.748907524043535]
We propose a novel framework for increasing the efficiency of weak supervision process while decreasing the dependency on domain experts.
Our method requires a small set of labeled examples per label class and automatically creates a set of labeling functions to assign noisy labels to numerous unlabeled data.
arXiv Detail & Related papers (2023-02-07T07:12:05Z) - Label Noise-Resistant Mean Teaching for Weakly Supervised Fake News
Detection [93.6222609806278]
We propose a novel label noise-resistant mean teaching approach (LNMT) for weakly supervised fake news detection.
LNMT leverages unlabeled news and feedback comments of users to enlarge the amount of training data.
LNMT establishes a mean teacher framework equipped with label propagation and label reliability estimation.
arXiv Detail & Related papers (2022-06-10T16:01:58Z) - Open Source HamNoSys Parser for Multilingual Sign Language Encoding [3.867363075280544]
This paper presents an automated tool to convert HamNoSys annotations into numerical labels.
Our proposed numerical multilabels greatly simplify the structure of HamNoSys annotation without significant loss of gloss meaning.
These numerical multilabels can potentially be used to feed the machine learning models, which would accelerate the development of vision-based sign language recognition.
arXiv Detail & Related papers (2022-04-14T12:33:33Z) - Omni-DETR: Omni-Supervised Object Detection with Transformers [165.4190908259015]
We consider the problem of omni-supervised object detection, which can use unlabeled, fully labeled and weakly labeled annotations.
Under this unified architecture, different types of weak labels can be leveraged to generate accurate pseudo labels.
We have found that weak annotations can help to improve detection performance and a mixture of them can achieve a better trade-off between annotation cost and accuracy.
arXiv Detail & Related papers (2022-03-30T06:36:09Z) - Label Semantics for Few Shot Named Entity Recognition [68.01364012546402]
We study the problem of few shot learning for named entity recognition.
We leverage the semantic information in the names of the labels as a way of giving the model additional signal and enriched priors.
Our model learns to match the representations of named entities computed by the first encoder with label representations computed by the second encoder.
arXiv Detail & Related papers (2022-03-16T23:21:05Z) - Assisted Text Annotation Using Active Learning to Achieve High Quality
with Little Effort [9.379650501033465]
We propose a tool that enables researchers to create large, high-quality, annotated datasets with only a few manual annotations.
We combine an active learning (AL) approach with a pre-trained language model to semi-automatically identify annotation categories.
Our preliminary results show that employing AL strongly reduces the number of annotations for correct classification of even complex and subtle frames.
arXiv Detail & Related papers (2021-12-15T13:14:58Z) - Learning with Noisy Labels by Targeted Relabeling [52.0329205268734]
Crowdsourcing platforms are often used to collect datasets for training deep neural networks.
We propose an approach which reserves a fraction of annotations to explicitly relabel highly probable labeling errors.
arXiv Detail & Related papers (2021-10-15T20:37:29Z) - Learning to Aggregate and Refine Noisy Labels for Visual Sentiment
Analysis [69.48582264712854]
We propose a robust learning method to perform robust visual sentiment analysis.
Our method relies on an external memory to aggregate and filter noisy labels during training.
We establish a benchmark for visual sentiment analysis with label noise using publicly available datasets.
arXiv Detail & Related papers (2021-09-15T18:18:28Z) - skweak: Weak Supervision Made Easy for NLP [13.37847225239485]
We present skweak, a Python-based software toolkit enabling NLP developers to apply weak supervision to a wide range of NLP tasks.
We use labelling functions derived from domain knowledge to automatically obtain annotations for a given dataset.
The resulting labels are then aggregated with a generative model that estimates the accuracy (and possible confusions) of each labelling function.
arXiv Detail & Related papers (2021-04-19T23:26:51Z) - DART: A Lightweight Quality-Suggestive Data-to-Text Annotation Tool [15.268017930901332]
The Data AnnotatoR Tool (DART) is an interactive application that reduces human efforts in annotating large quantities of structured data.
By using a sequence-to-sequence model, our system iteratively analyzes the annotated labels in order to better sample unlabeled data.
In a simulation experiment performed on annotating large quantities of structured data, DART has been shown to reduce the total number of annotations needed with active learning and automatically suggesting relevant labels.
arXiv Detail & Related papers (2020-10-08T17:36:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.