DART: A Lightweight Quality-Suggestive Data-to-Text Annotation Tool
- URL: http://arxiv.org/abs/2010.04141v2
- Date: Tue, 1 Dec 2020 12:58:57 GMT
- Title: DART: A Lightweight Quality-Suggestive Data-to-Text Annotation Tool
- Authors: Ernie Chang, Jeriah Caplinger, Alex Marin, Xiaoyu Shen, Vera Demberg
- Abstract summary: The Data AnnotatoR Tool (DART) is an interactive application that reduces human efforts in annotating large quantities of structured data.
By using a sequence-to-sequence model, our system iteratively analyzes the annotated labels in order to better sample unlabeled data.
In a simulation experiment performed on annotating large quantities of structured data, DART has been shown to reduce the total number of annotations needed with active learning and automatically suggesting relevant labels.
- Score: 15.268017930901332
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a lightweight annotation tool, the Data AnnotatoR Tool (DART), for
the general task of labeling structured data with textual descriptions. The
tool is implemented as an interactive application that reduces human efforts in
annotating large quantities of structured data, e.g. in the format of a table
or tree structure. By using a backend sequence-to-sequence model, our system
iteratively analyzes the annotated labels in order to better sample unlabeled
data. In a simulation experiment performed on annotating large quantities of
structured data, DART has been shown to reduce the total number of annotations
needed with active learning and automatically suggesting relevant labels.
Related papers
- ChartifyText: Automated Chart Generation from Data-Involved Texts via LLM [16.87320295911898]
Text documents with numerical values involved are widely used in various applications such as scientific research, economy, public health and journalism.
To fill this research gap, this work aims to automatically generate charts to accurately convey the underlying data and ideas to readers.
We propose ChartifyText, a novel fully-automated approach that leverages Large Language Models (LLMs) to convert complex data-involved texts to expressive charts.
arXiv Detail & Related papers (2024-10-18T09:43:30Z) - Substituting Data Annotation with Balanced Updates and Collective Loss
in Multi-label Text Classification [19.592985329023733]
Multi-label text classification (MLTC) is the task of assigning multiple labels to a given text.
We study the MLTC problem in annotation-free and scarce-annotation settings in which the magnitude of available supervision signals is linear to the number of labels.
Our method follows three steps, (1) mapping input text into a set of preliminary label likelihoods by natural language inference using a pre-trained language model, (2) calculating a signed label dependency graph by label descriptions, and (3) updating the preliminary label likelihoods with message passing along the label dependency graph.
arXiv Detail & Related papers (2023-09-24T04:12:52Z) - Thinking Like an Annotator: Generation of Dataset Labeling Instructions [59.603239753484345]
We introduce a new task, Labeling Instruction Generation, to address missing publicly available labeling instructions.
We take a reasonably annotated dataset and: 1) generate a set of examples that are visually representative of each category in the dataset; 2) provide a text label that corresponds to each of the examples.
This framework acts as a proxy to human annotators that can help to both generate a final labeling instruction set and evaluate its quality.
arXiv Detail & Related papers (2023-06-24T18:32:48Z) - ChartSumm: A Comprehensive Benchmark for Automatic Chart Summarization
of Long and Short Summaries [0.26097841018267615]
Automatic chart to text summarization is an effective tool for the visually impaired people.
In this paper, we propose ChartSumm: a large-scale benchmark dataset consisting of a total of 84,363 charts.
arXiv Detail & Related papers (2023-04-26T15:25:24Z) - SciAnnotate: A Tool for Integrating Weak Labeling Sources for Sequence
Labeling [55.71459234749639]
SciAnnotate is a web-based tool for text annotation called SciAnnotate, which stands for scientific annotation tool.
Our tool provides users with multiple user-friendly interfaces for creating weak labels.
In this study, we take multi-source weak label denoising as an example, we utilized a Bertifying Conditional Hidden Markov Model to denoise the weak label generated by our tool.
arXiv Detail & Related papers (2022-08-07T19:18:13Z) - PointMatch: A Consistency Training Framework for Weakly Supervised
Semantic Segmentation of 3D Point Clouds [117.77841399002666]
We propose a novel framework, PointMatch, that stands on both data and label, by applying consistency regularization to sufficiently probe information from data itself.
The proposed PointMatch achieves the state-of-the-art performance under various weakly-supervised schemes on both ScanNet-v2 and S3DIS datasets.
arXiv Detail & Related papers (2022-02-22T07:26:31Z) - Assisted Text Annotation Using Active Learning to Achieve High Quality
with Little Effort [9.379650501033465]
We propose a tool that enables researchers to create large, high-quality, annotated datasets with only a few manual annotations.
We combine an active learning (AL) approach with a pre-trained language model to semi-automatically identify annotation categories.
Our preliminary results show that employing AL strongly reduces the number of annotations for correct classification of even complex and subtle frames.
arXiv Detail & Related papers (2021-12-15T13:14:58Z) - Partially-Aligned Data-to-Text Generation with Distant Supervision [69.15410325679635]
We propose a new generation task called Partially-Aligned Data-to-Text Generation (PADTG)
It is more practical since it utilizes automatically annotated data for training and thus considerably expands the application domains.
Our framework outperforms all baseline models as well as verify the feasibility of utilizing partially-aligned data.
arXiv Detail & Related papers (2020-10-03T03:18:52Z) - DART: Open-Domain Structured Data Record to Text Generation [91.23798751437835]
We present DART, an open domain structured DAta Record to Text generation dataset with over 82k instances (DARTs)
We propose a procedure of extracting semantic triples from tables that encode their structures by exploiting the semantic dependencies among table headers and the table title.
Our dataset construction framework effectively merged heterogeneous sources from open domain semantic parsing and dialogue-act-based meaning representation tasks.
arXiv Detail & Related papers (2020-07-06T16:35:30Z) - Weakly-Supervised Salient Object Detection via Scribble Annotations [54.40518383782725]
We propose a weakly-supervised salient object detection model to learn saliency from scribble labels.
We present a new metric, termed saliency structure measure, to measure the structure alignment of the predicted saliency maps.
Our method not only outperforms existing weakly-supervised/unsupervised methods, but also is on par with several fully-supervised state-of-the-art models.
arXiv Detail & Related papers (2020-03-17T12:59:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.