Related papers: Efficient Annotator Reliability Assessment with EffiARA

Efficient Annotator Reliability Assessment with EffiARA

URL: http://arxiv.org/abs/2504.00589v2
Date: Thu, 03 Apr 2025 22:24:47 GMT
Title: Efficient Annotator Reliability Assessment with EffiARA
Authors: Owen Cook, Jake Vasilakes, Ian Roberts, Xingyi Song,
Abstract summary: EffiARA is a framework to support the whole annotation pipeline, from understanding the resources required for an annotation task to compiling the annotated dataset.<n>The framework's efficacy is supported by two previous studies: one improving classification performance through annotator-reliability-based soft label aggregation and sample weighting, and the other increasing the overall agreement among annotators.<n>This work introduces the EffiARA Python package and its accompanying webtool, which provides an accessible graphical user interface for the system.
Score: 1.5145272476388434
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Data annotation is an essential component of the machine learning pipeline; it is also a costly and time-consuming process. With the introduction of transformer-based models, annotation at the document level is increasingly popular; however, there is no standard framework for structuring such tasks. The EffiARA annotation framework is, to our knowledge, the first project to support the whole annotation pipeline, from understanding the resources required for an annotation task to compiling the annotated dataset and gaining insights into the reliability of individual annotators as well as the dataset as a whole. The framework's efficacy is supported by two previous studies: one improving classification performance through annotator-reliability-based soft label aggregation and sample weighting, and the other increasing the overall agreement among annotators through removing identifying and replacing an unreliable annotator. This work introduces the EffiARA Python package and its accompanying webtool, which provides an accessible graphical user interface for the system. We open-source the EffiARA Python package at https://github.com/MiniEggz/EffiARA and the webtool is publicly accessible at https://effiara.gate.ac.uk.

Related papers

Label Anything: An Interpretable, High-Fidelity and Prompt-Free Annotator [29.2532061585323]
Traditional manual labeling involves high cost to annotate vast amount of required data for training robust model. We propose a Label Anything Model (LAM) serving as an interpretable, high-fidelity, and prompt-free data annotator. LAM can generate high-fidelity annotations (almost 100% in mIoU) for multiple real-world datasets.
arXiv Detail & Related papers (2025-02-05T08:14:52Z)
Data-Juicer 2.0: Cloud-Scale Adaptive Data Processing for and with Foundation Models [64.28420991770382]
Data-Juicer 2.0 is a data processing system backed by data processing operators spanning text, image, video, and audio modalities.<n>It supports more critical tasks including data analysis, annotation, and foundation model post-training.<n>It has been widely adopted in diverse research fields and real-world products such as Alibaba Cloud PAI.
arXiv Detail & Related papers (2024-12-23T08:29:57Z)
UIFormer: A Unified Transformer-based Framework for Incremental Few-Shot Object Detection and Instance Segmentation [38.331860053615955]
This paper introduces a novel framework for unified incremental few-shot object detection (iFSOD) and instance segmentation (iFSIS) using the Transformer architecture. Our goal is to create an optimal solution for situations where only a few examples of novel object classes are available.
arXiv Detail & Related papers (2024-11-13T12:29:44Z)
GeoContrastNet: Contrastive Key-Value Edge Learning for Language-Agnostic Document Understanding [4.258365032282028]
We present a language-agnostic framework to structured document understanding (DU) by integrating a contrastive learning objective with graph attention networks (GATs) We propose a novel methodology that combines geometric edge features with visual features within an overall two-staged GAT-based framework. Our results highlight the model's proficiency in identifying key-value relationships within the FUNSD dataset for forms and also discovering the spatial relationships in table-structured layouts for RVLCDIP business invoices.
arXiv Detail & Related papers (2024-05-06T01:40:20Z)
Spatio-Temporal Side Tuning Pre-trained Foundation Models for Video-based Pedestrian Attribute Recognition [58.79807861739438]
Existing pedestrian recognition (PAR) algorithms are mainly developed based on a static image. We propose to understand human attributes using video frames that can fully use temporal information.
arXiv Detail & Related papers (2024-04-27T14:43:32Z)
Interfacing Foundation Models' Embeddings [131.0352288172788]
We present FIND, a generalized interface for aligning foundation models' embeddings with unified image and dataset-level understanding spanning modality and granularity. In light of the interleaved embedding space, we introduce FIND-Bench, which introduces new training and evaluation annotations to the COCO dataset for interleaved segmentation and retrieval.
arXiv Detail & Related papers (2023-12-12T18:58:02Z)
Thresh: A Unified, Customizable and Deployable Platform for Fine-Grained Text Evaluation [11.690442820401453]
We introduce Thresh, a unified, customizable and deployable platform for fine-grained evaluation. Thresh provides a community hub that hosts a collection of fine-grained frameworks and corresponding annotations made and collected by the community. For deployment, Thresh offers multiple options for any scale of annotation projects from small manual inspections to large crowdsourcing ones.
arXiv Detail & Related papers (2023-08-14T06:09:51Z)
Visual Recognition by Request [111.94887516317735]
We present a novel protocol of annotation and evaluation for visual recognition. It does not require the labeler/algorithm to annotate/recognize all targets (objects, parts, etc.) at once, but instead raises a number of recognition instructions and the algorithm recognizes targets by request. We evaluate the recognition system on two mixed-annotated datasets, CPP and ADE20K, and demonstrate its promising ability of learning from partially labeled data.
arXiv Detail & Related papers (2022-07-28T16:55:11Z)
Fashionformer: A simple, Effective and Unified Baseline for Human Fashion Segmentation and Recognition [80.74495836502919]
In this work, we focus on joint human fashion segmentation and attribute recognition. We introduce the object query for segmentation and the attribute query for attribute prediction. For attribute stream, we design a novel Multi-Layer Rendering module to explore more fine-grained features.
arXiv Detail & Related papers (2022-04-10T11:11:10Z)
Omni-DETR: Omni-Supervised Object Detection with Transformers [165.4190908259015]
We consider the problem of omni-supervised object detection, which can use unlabeled, fully labeled and weakly labeled annotations. Under this unified architecture, different types of weak labels can be leveraged to generate accurate pseudo labels. We have found that weak annotations can help to improve detection performance and a mixture of them can achieve a better trade-off between annotation cost and accuracy.
arXiv Detail & Related papers (2022-03-30T06:36:09Z)
Long Document Summarization with Top-down and Bottom-up Inference [113.29319668246407]
We propose a principled inference framework to improve summarization models on two aspects. Our framework assumes a hierarchical latent structure of a document where the top-level captures the long range dependency. We demonstrate the effectiveness of the proposed framework on a diverse set of summarization datasets.
arXiv Detail & Related papers (2022-03-15T01:24:51Z)
FAMIE: A Fast Active Learning Framework for Multilingual Information Extraction [40.28976617483996]
FAMIE is a comprehensive and efficient active learning (AL) toolkit for multilingual information extraction. Based on the idea of using a small proxy network for fast data selection, we introduce a novel knowledge distillation mechanism. Experiments demonstrate the advantages of FAMIE in terms of competitive performance and time efficiency for sequence labeling with AL.
arXiv Detail & Related papers (2022-02-16T20:11:31Z)
Assisted Text Annotation Using Active Learning to Achieve High Quality with Little Effort [9.379650501033465]
We propose a tool that enables researchers to create large, high-quality, annotated datasets with only a few manual annotations. We combine an active learning (AL) approach with a pre-trained language model to semi-automatically identify annotation categories. Our preliminary results show that employing AL strongly reduces the number of annotations for correct classification of even complex and subtle frames.
arXiv Detail & Related papers (2021-12-15T13:14:58Z)
Mining Implicit Entity Preference from User-Item Interaction Data for Knowledge Graph Completion via Adversarial Learning [82.46332224556257]
We propose a novel adversarial learning approach by leveraging user interaction data for the Knowledge Graph Completion task. Our generator is isolated from user interaction data, and serves to improve the performance of the discriminator. To discover implicit entity preference of users, we design an elaborate collaborative learning algorithms based on graph neural networks.
arXiv Detail & Related papers (2020-03-28T05:47:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.