Thresh: A Unified, Customizable and Deployable Platform for Fine-Grained
Text Evaluation
- URL: http://arxiv.org/abs/2308.06953v3
- Date: Mon, 16 Oct 2023 14:51:08 GMT
- Title: Thresh: A Unified, Customizable and Deployable Platform for Fine-Grained
Text Evaluation
- Authors: David Heineman, Yao Dou, Wei Xu
- Abstract summary: We introduce Thresh, a unified, customizable and deployable platform for fine-grained evaluation.
Thresh provides a community hub that hosts a collection of fine-grained frameworks and corresponding annotations made and collected by the community.
For deployment, Thresh offers multiple options for any scale of annotation projects from small manual inspections to large crowdsourcing ones.
- Score: 11.690442820401453
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Fine-grained, span-level human evaluation has emerged as a reliable and
robust method for evaluating text generation tasks such as summarization,
simplification, machine translation and news generation, and the derived
annotations have been useful for training automatic metrics and improving
language models. However, existing annotation tools implemented for these
evaluation frameworks lack the adaptability to be extended to different domains
or languages, or modify annotation settings according to user needs; and, the
absence of a unified annotated data format inhibits the research in multi-task
learning. In this paper, we introduce Thresh, a unified, customizable and
deployable platform for fine-grained evaluation. With a single YAML
configuration file, users can build and test an annotation interface for any
framework within minutes -- all in one web browser window. To facilitate
collaboration and sharing, Thresh provides a community hub that hosts a
collection of fine-grained frameworks and corresponding annotations made and
collected by the community, covering a wide range of NLP tasks. For deployment,
Thresh offers multiple options for any scale of annotation projects from small
manual inspections to large crowdsourcing ones. Additionally, we introduce a
Python library to streamline the entire process from typology design and
deployment to annotation processing. Thresh is publicly accessible at
https://thresh.tools.
Related papers
- Efficient Annotator Reliability Assessment with EffiARA [1.5145272476388434]
EffiARA is a framework to support the whole annotation pipeline, from understanding the resources required for an annotation task to compiling the annotated dataset.
The framework's efficacy is supported by two previous studies: one improving classification performance through annotator-reliability-based soft label aggregation and sample weighting, and the other increasing the overall agreement among annotators.
This work introduces the EffiARA Python package and its accompanying webtool, which provides an accessible graphical user interface for the system.
arXiv Detail & Related papers (2025-04-01T09:48:09Z) - Generative Compositor for Few-Shot Visual Information Extraction [60.663887314625164]
We propose a novel generative model, named Generative generative spatialtor, to address the challenge of few-shot VIE.
Generative generative spatialtor is a hybrid pointer-generator network that emulates the operations of a compositor by retrieving words from the source text.
The proposed method achieves highly competitive results in the full-sample training, while notably outperforms the baseline in the 1-shot, 5-shot, and 10-shot settings.
arXiv Detail & Related papers (2025-03-21T04:56:24Z) - AutoGUI: Scaling GUI Grounding with Automatic Functionality Annotations from LLMs [54.58905728115257]
We propose the methodname pipeline for automatically annotating UI elements with detailed functionality descriptions at scale.
Specifically, we leverage large language models (LLMs) to infer element functionality by comparing the UI content changes before and after simulated interactions with specific UI elements.
We construct an methodname-704k dataset using the proposed pipeline, featuring multi-resolution, multi-device screenshots, diverse data domains, and detailed functionality annotations that have never been provided by previous datasets.
arXiv Detail & Related papers (2025-02-04T03:39:59Z) - COMMENTATOR: A Code-mixed Multilingual Text Annotation Framework [1.114560772534785]
We introduce a code-mixed multilingual text annotation framework, COMMENTATOR, specifically designed for annotating code-mixed text.
The tool demonstrates its effectiveness in token-level and sentence-level language annotation tasks for Hinglish text.
arXiv Detail & Related papers (2024-08-06T11:56:26Z) - CMULAB: An Open-Source Framework for Training and Deployment of Natural Language Processing Models [59.91221728187576]
This paper introduces the CMU Linguistic Linguistic Backend, an open-source framework that simplifies model deployment and continuous human-in-the-loop fine-tuning of NLP models.
CMULAB enables users to leverage the power of multilingual models to quickly adapt and extend existing tools for speech recognition, OCR, translation, and syntactic analysis to new languages.
arXiv Detail & Related papers (2024-04-03T02:21:46Z) - Unitxt: Flexible, Shareable and Reusable Data Preparation and Evaluation
for Generative AI [15.220987187105607]
Unitxt is an innovative library for customizable textual data preparation and evaluation tailored to generative language models.
Unitxt integrates with common libraries like HFace and LM-eval-harness, enabling easy customization and sharing between practitioners.
Beyond being a tool, Unitxt is a community-driven platform, empowering users to build, share, and advance their pipelines.
arXiv Detail & Related papers (2024-01-25T08:57:33Z) - Antarlekhaka: A Comprehensive Tool for Multi-task Natural Language
Annotation [0.0]
Antarlekhaka is a tool for manual annotation of a comprehensive set of tasks relevant to Natural Language Processing.
The tool is Unicode-compatible, language-agnostic, Web-deployable and supports distributed annotation by multiple simultaneous annotators.
It has been used for two real-life annotation tasks on two different languages, namely, Sanskrit and Bengali.
arXiv Detail & Related papers (2023-10-11T19:09:07Z) - TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision [61.186488081379]
We propose TextFormer, a query-based end-to-end text spotter with Transformer architecture.
TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multi-task modeling.
It allows for mutual training and optimization of classification, segmentation, and recognition branches, resulting in deeper feature sharing.
arXiv Detail & Related papers (2023-06-06T03:37:41Z) - Summary Workbench: Unifying Application and Evaluation of Text
Summarization Models [24.40171915438056]
New models and evaluation measures can be easily integrated as Docker-based plugins.
Visual analyses combining multiple measures provide insights into the models' strengths and weaknesses.
arXiv Detail & Related papers (2022-10-18T04:47:25Z) - Selective Annotation Makes Language Models Better Few-Shot Learners [97.07544941620367]
Large language models can perform in-context learning, where they learn a new task from a few task demonstrations.
This work examines the implications of in-context learning for the creation of datasets for new natural language tasks.
We propose an unsupervised, graph-based selective annotation method, voke-k, to select diverse, representative examples to annotate.
arXiv Detail & Related papers (2022-09-05T14:01:15Z) - A New Generation of Perspective API: Efficient Multilingual
Character-level Transformers [66.9176610388952]
We present the fundamentals behind the next version of the Perspective API from Google Jigsaw.
At the heart of the approach is a single multilingual token-free Charformer model.
We demonstrate that by forgoing static vocabularies, we gain flexibility across a variety of settings.
arXiv Detail & Related papers (2022-02-22T20:55:31Z) - OPAD: An Optimized Policy-based Active Learning Framework for Document
Content Analysis [6.159771892460152]
We propose textitOPAD, a novel framework using reinforcement policy for active learning in content detection tasks for documents.
The framework learns the acquisition function to decide the samples to be selected while optimizing performance metrics.
We show superior performance of the proposed textitOPAD framework for active learning for various tasks related to document understanding.
arXiv Detail & Related papers (2021-10-01T07:40:56Z) - A Data-Centric Framework for Composable NLP Workflows [109.51144493023533]
Empirical natural language processing systems in application domains (e.g., healthcare, finance, education) involve interoperation among multiple components.
We establish a unified open-source framework to support fast development of such sophisticated NLP in a composable manner.
arXiv Detail & Related papers (2021-03-02T16:19:44Z) - UniT: Unified Knowledge Transfer for Any-shot Object Detection and
Segmentation [52.487469544343305]
Methods for object detection and segmentation rely on large scale instance-level annotations for training.
We propose an intuitive and unified semi-supervised model that is applicable to a range of supervision.
arXiv Detail & Related papers (2020-06-12T22:45:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.