POTATO: The Portable Text Annotation Tool
- URL: http://arxiv.org/abs/2212.08620v2
- Date: Thu, 23 Mar 2023 18:45:37 GMT
- Title: POTATO: The Portable Text Annotation Tool
- Authors: Jiaxin Pei, Aparna Ananthasubramaniam, Xingyao Wang, Naitian Zhou,
Jackson Sargent, Apostolos Dedeloudis and David Jurgens
- Abstract summary: We present POTATO, a free, fully open-sourced annotation system.
It supports labeling many types of text and multimodal data.
It offers easy-to-configure features to maximize the productivity of both deployers and annotators.
- Score: 8.924906491840119
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present POTATO, the Portable text annotation tool, a free, fully
open-sourced annotation system that 1) supports labeling many types of text and
multimodal data; 2) offers easy-to-configure features to maximize the
productivity of both deployers and annotators (convenient templates for common
ML/NLP tasks, active learning, keypress shortcuts, keyword highlights,
tooltips); and 3) supports a high degree of customization (editable UI,
inserting pre-screening questions, attention and qualification tests).
Experiments over two annotation tasks suggest that POTATO improves labeling
speed through its specially-designed productivity features, especially for long
documents and complex tasks. POTATO is available at
https://github.com/davidjurgens/potato and will continue to be updated.
Related papers
- Tag-Pag: A Dedicated Tool for Systematic Web Page Annotations [2.7961972519572447]
Tag-Pag is an application designed to simplify the categorization of web pages.
Unlike existing tools that focus on annotating sections of text, Tag-Pag systematizes page-level annotations.
arXiv Detail & Related papers (2025-02-22T08:52:01Z) - OmniParser: A Unified Framework for Text Spotting, Key Information Extraction and Table Recognition [79.852642726105]
We propose a unified paradigm for parsing visually-situated text across diverse scenarios.
Specifically, we devise a universal model, called Omni, which can simultaneously handle three typical visually-situated text parsing tasks.
In Omni, all tasks share the unified encoder-decoder architecture, the unified objective point-conditioned text generation, and the unified input representation.
arXiv Detail & Related papers (2024-03-28T03:51:14Z) - EEVEE: An Easy Annotation Tool for Natural Language Processing [32.111061774093]
We propose EEVEE, an annotation tool focused on simplicity, efficiency, and ease of use.
It can run directly in the browser (no setup required) and uses tab-separated files (as opposed to character offsets or task-specific formats) for annotation.
arXiv Detail & Related papers (2024-02-05T10:24:40Z) - Tell Your Model Where to Attend: Post-hoc Attention Steering for LLMs [80.48606583629123]
PASTA is a method that allows large language models to read text with user-specified emphasis marks.
It can substantially enhance an LLM's ability to follow user instructions or integrate new knowledge from user inputs.
arXiv Detail & Related papers (2023-11-03T22:56:43Z) - Antarlekhaka: A Comprehensive Tool for Multi-task Natural Language
Annotation [0.0]
Antarlekhaka is a tool for manual annotation of a comprehensive set of tasks relevant to Natural Language Processing.
The tool is Unicode-compatible, language-agnostic, Web-deployable and supports distributed annotation by multiple simultaneous annotators.
It has been used for two real-life annotation tasks on two different languages, namely, Sanskrit and Bengali.
arXiv Detail & Related papers (2023-10-11T19:09:07Z) - UReader: Universal OCR-free Visually-situated Language Understanding
with Multimodal Large Language Model [108.85584502396182]
We propose UReader, a first exploration of universal OCR-free visually-situated language understanding based on the Multimodal Large Language Model (MLLM)
By leveraging the shallow text recognition ability of the MLLM, we only finetuned 1.2% parameters.
Our single model achieves state-of-the-art ocr-free performance in 8 out of 10 visually-situated language understanding tasks.
arXiv Detail & Related papers (2023-10-08T11:33:09Z) - Thresh: A Unified, Customizable and Deployable Platform for Fine-Grained
Text Evaluation [11.690442820401453]
We introduce Thresh, a unified, customizable and deployable platform for fine-grained evaluation.
Thresh provides a community hub that hosts a collection of fine-grained frameworks and corresponding annotations made and collected by the community.
For deployment, Thresh offers multiple options for any scale of annotation projects from small manual inspections to large crowdsourcing ones.
arXiv Detail & Related papers (2023-08-14T06:09:51Z) - PartAL: Efficient Partial Active Learning in Multi-Task Visual Settings [57.08386016411536]
We show that it is more effective to select not only the images to be annotated but also a subset of tasks for which to provide annotations at each Active Learning (AL)
We demonstrate the effectiveness of our approach on several popular multi-task datasets.
arXiv Detail & Related papers (2022-11-21T15:08:35Z) - Binding Language Models in Symbolic Languages [146.3027328556881]
Binder is a training-free neural-symbolic framework that maps the task input to a program.
In the parsing stage, Codex is able to identify the part of the task input that cannot be answerable by the original programming language.
In the execution stage, Codex can perform versatile functionalities given proper prompts in the API calls.
arXiv Detail & Related papers (2022-10-06T12:55:17Z) - SciAnnotate: A Tool for Integrating Weak Labeling Sources for Sequence
Labeling [55.71459234749639]
SciAnnotate is a web-based tool for text annotation called SciAnnotate, which stands for scientific annotation tool.
Our tool provides users with multiple user-friendly interfaces for creating weak labels.
In this study, we take multi-source weak label denoising as an example, we utilized a Bertifying Conditional Hidden Markov Model to denoise the weak label generated by our tool.
arXiv Detail & Related papers (2022-08-07T19:18:13Z) - Massive Choice, Ample Tasks (MaChAmp): A Toolkit for Multi-task Learning
in NLP [24.981991538150584]
MaChAmp is a toolkit for easy fine-tuning of contextualized embeddings in multi-task settings.
The benefits of MaChAmp are its flexible configuration options, and the support of a variety of natural language processing tasks in a uniform toolkit.
arXiv Detail & Related papers (2020-05-29T16:54:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.