Annotator in the Loop: A Case Study of In-Depth Rater Engagement to Create a Bridging Benchmark Dataset
- URL: http://arxiv.org/abs/2408.00880v1
- Date: Thu, 1 Aug 2024 19:11:08 GMT
- Title: Annotator in the Loop: A Case Study of In-Depth Rater Engagement to Create a Bridging Benchmark Dataset
- Authors: Sonja Schmer-Galunder, Ruta Wheelock, Scott Friedman, Alyssa Chvasta, Zaria Jalan, Emily Saltz,
- Abstract summary: We describe a novel, collaborative, and iterative annotator-in-the-loop methodology for annotation.
Our findings indicate that collaborative engagement with annotators can enhance annotation methods.
- Score: 1.825224193230824
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: With the growing prevalence of large language models, it is increasingly common to annotate datasets for machine learning using pools of crowd raters. However, these raters often work in isolation as individual crowdworkers. In this work, we regard annotation not merely as inexpensive, scalable labor, but rather as a nuanced interpretative effort to discern the meaning of what is being said in a text. We describe a novel, collaborative, and iterative annotator-in-the-loop methodology for annotation, resulting in a 'Bridging Benchmark Dataset' of comments relevant to bridging divides, annotated from 11,973 textual posts in the Civil Comments dataset. The methodology differs from popular anonymous crowd-rating annotation processes due to its use of an in-depth, iterative engagement with seven US-based raters to (1) collaboratively refine the definitions of the to-be-annotated concepts and then (2) iteratively annotate complex social concepts, with check-in meetings and discussions. This approach addresses some shortcomings of current anonymous crowd-based annotation work, and we present empirical evidence of the performance of our annotation process in the form of inter-rater reliability. Our findings indicate that collaborative engagement with annotators can enhance annotation methods, as opposed to relying solely on isolated work conducted remotely. We provide an overview of the input texts, attributes, and annotation process, along with the empirical results and the resulting benchmark dataset, categorized according to the following attributes: Alienation, Compassion, Reasoning, Curiosity, Moral Outrage, and Respect.
Related papers
- Con-ReCall: Detecting Pre-training Data in LLMs via Contrastive Decoding [118.75567341513897]
Existing methods typically analyze target text in isolation or solely with non-member contexts.
We propose Con-ReCall, a novel approach that leverages the asymmetric distributional shifts induced by member and non-member contexts.
arXiv Detail & Related papers (2024-09-05T09:10:38Z) - Hierarchical Indexing for Retrieval-Augmented Opinion Summarization [60.5923941324953]
We propose a method for unsupervised abstractive opinion summarization that combines the attributability and scalability of extractive approaches with the coherence and fluency of Large Language Models (LLMs)
Our method, HIRO, learns an index structure that maps sentences to a path through a semantically organized discrete hierarchy.
At inference time, we populate the index and use it to identify and retrieve clusters of sentences containing popular opinions from input reviews.
arXiv Detail & Related papers (2024-03-01T10:38:07Z) - Attributable and Scalable Opinion Summarization [79.87892048285819]
We generate abstractive summaries by decoding frequent encodings, and extractive summaries by selecting the sentences assigned to the same frequent encodings.
Our method is attributable, because the model identifies sentences used to generate the summary as part of the summarization process.
It scales easily to many hundreds of input reviews, because aggregation is performed in the latent space rather than over long sequences of tokens.
arXiv Detail & Related papers (2023-05-19T11:30:37Z) - On Releasing Annotator-Level Labels and Information in Datasets [6.546195629698355]
We show that label aggregation may introduce representational biases of individual and group perspectives.
We propose recommendations for increased utility and transparency of datasets for downstream use cases.
arXiv Detail & Related papers (2021-10-12T02:35:45Z) - Author Clustering and Topic Estimation for Short Texts [69.54017251622211]
We propose a novel model that expands on the Latent Dirichlet Allocation by modeling strong dependence among the words in the same document.
We also simultaneously cluster users, removing the need for post-hoc cluster estimation.
Our method performs as well as -- or better -- than traditional approaches to problems arising in short text.
arXiv Detail & Related papers (2021-06-15T20:55:55Z) - Nutribullets Hybrid: Multi-document Health Summarization [36.95954983680022]
We present a method for generating comparative summaries that highlights similarities and contradictions in input documents.
Our framework leads to more faithful, relevant and aggregation-sensitive summarization -- while being equally fluent.
arXiv Detail & Related papers (2021-04-08T01:44:29Z) - Relation Clustering in Narrative Knowledge Graphs [71.98234178455398]
relational sentences in the original text are embedded (with SBERT) and clustered in order to merge together semantically similar relations.
Preliminary tests show that such clustering might successfully detect similar relations, and provide a valuable preprocessing for semi-supervised approaches.
arXiv Detail & Related papers (2020-11-27T10:43:04Z) - Weakly-Supervised Aspect-Based Sentiment Analysis via Joint
Aspect-Sentiment Topic Embedding [71.2260967797055]
We propose a weakly-supervised approach for aspect-based sentiment analysis.
We learn sentiment, aspect> joint topic embeddings in the word embedding space.
We then use neural models to generalize the word-level discriminative information.
arXiv Detail & Related papers (2020-10-13T21:33:24Z) - Unsupervised Summarization by Jointly Extracting Sentences and Keywords [12.387378783627762]
RepRank is an unsupervised graph-based ranking model for extractive multi-document summarization.
We show that salient sentences and keywords can be extracted in a joint and mutual reinforcement process using our learned representations.
Experiment results with multiple benchmark datasets show that RepRank achieved the best or comparable performance in ROUGE.
arXiv Detail & Related papers (2020-09-16T05:58:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.