Tracing How Annotators Think: Augmenting Preference Judgments with Reading Processes
- URL: http://arxiv.org/abs/2511.21912v1
- Date: Wed, 26 Nov 2025 21:07:02 GMT
- Title: Tracing How Annotators Think: Augmenting Preference Judgments with Reading Processes
- Authors: Karin de Langis, William Walker, Khanh Chi Le, Dongyeop Kang,
- Abstract summary: PreferRead is a dataset of fine-grained annotator reading behaviors obtained from mouse tracking.<n>We find that annotators re-read a response in roughly half of all trials, most often revisiting the option they ultimately choose, and rarely revisit the prompt.<n>Reading processes provide a complementary cognitive dimension for understanding annotator reliability, decision-making and disagreement in complex, subjective NLP tasks.
- Score: 19.256453565219786
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose an annotation approach that captures not only labels but also the reading process underlying annotators' decisions, e.g., what parts of the text they focus on, re-read or skim. Using this framework, we conduct a case study on the preference annotation task, creating a dataset PreferRead that contains fine-grained annotator reading behaviors obtained from mouse tracking. PreferRead enables detailed analysis of how annotators navigate between a prompt and two candidate responses before selecting their preference. We find that annotators re-read a response in roughly half of all trials, most often revisiting the option they ultimately choose, and rarely revisit the prompt. Reading behaviors are also significantly related to annotation outcomes: re-reading is associated with higher inter-annotator agreement, whereas long reading paths and times are associated with lower agreement. These results demonstrate that reading processes provide a complementary cognitive dimension for understanding annotator reliability, decision-making and disagreement in complex, subjective NLP tasks. Our code and data are publicly available.
Related papers
- Read As Human: Compressing Context via Parallelizable Close Reading and Skimming [34.83776292069694]
RAM (Read As HuMan) is a context compression framework that adopts an adaptive hybrid reading strategy.<n>Inspired by human reading behavior, RAM partitions the context into segments and encodes them with the input query in parallel.<n> Experiments demonstrate that RAM outperforms existing baselines on multiple question answering and summarization benchmarks.
arXiv Detail & Related papers (2026-02-02T09:10:56Z) - RAC: Retrieval-Augmented Clarification for Faithful Conversational Search [7.0486278653981245]
We introduce RAC (Retrieval-Augmented Clarification), a framework for generating corpus-faithful clarification questions.<n>After comparing several indexing strategies for retrieval, we fine-tune a large language model to make optimal use of research context.<n>We then apply contrastive preference optimization to favor questions supported by retrieved passages over ungrounded alternatives.
arXiv Detail & Related papers (2026-01-16T19:16:38Z) - Previously on the Stories: Recap Snippet Identification for Story
Reading [51.641565531840186]
We propose the first benchmark on this useful task called Recap Snippet Identification with a hand-crafted evaluation dataset.
Our experiments show that the proposed task is challenging to PLMs, LLMs, and proposed methods as the task requires a deep understanding of the plot correlation between snippets.
arXiv Detail & Related papers (2024-02-11T18:27:14Z) - Towards Reliable and Factual Response Generation: Detecting Unanswerable
Questions in Information-Seeking Conversations [16.99952884041096]
Generative AI models face the challenge of hallucinations that can undermine users' trust in such systems.
We approach the problem of conversational information seeking as a two-step process, where relevant passages in a corpus are identified first and then summarized into a final system response.
Specifically, our proposed method employs a sentence-level classifier to detect if the answer is present, then aggregates these predictions on the passage level, and eventually across the top-ranked passages to arrive at a final answerability estimate.
arXiv Detail & Related papers (2024-01-21T10:15:36Z) - Rescue: Ranking LLM Responses with Partial Ordering to Improve Response Generation [28.89786334298637]
We develop a novel method to optimize LLMs using ranking metrics.
Rather than a traditional full ordering, we advocate for a partial ordering.
We test our system's improved response generation ability using benchmark datasets.
arXiv Detail & Related papers (2023-11-15T17:27:14Z) - IDEAL: Influence-Driven Selective Annotations Empower In-Context Learners in Large Language Models [63.15355173909631]
This paper introduces an influence-driven selective annotation method.<n>It aims to minimize annotation costs while improving the quality of in-context examples.<n> Experiments confirm the superiority of the proposed method on various benchmarks.
arXiv Detail & Related papers (2023-10-16T22:53:54Z) - Phrase Retrieval for Open-Domain Conversational Question Answering with
Conversational Dependency Modeling via Contrastive Learning [54.55643652781891]
Open-Domain Conversational Question Answering (ODConvQA) aims at answering questions through a multi-turn conversation.
We propose a method to directly predict answers with a phrase retrieval scheme for a sequence of words.
arXiv Detail & Related papers (2023-06-07T09:46:38Z) - Attributable and Scalable Opinion Summarization [79.87892048285819]
We generate abstractive summaries by decoding frequent encodings, and extractive summaries by selecting the sentences assigned to the same frequent encodings.
Our method is attributable, because the model identifies sentences used to generate the summary as part of the summarization process.
It scales easily to many hundreds of input reviews, because aggregation is performed in the latent space rather than over long sequences of tokens.
arXiv Detail & Related papers (2023-05-19T11:30:37Z) - Generating Usage-related Questions for Preference Elicitation in Conversational Recommender Systems [19.950705852361565]
We propose a novel approach to preference elicitation by asking implicit questions based on item usage.<n>We develop a high-quality labeled training dataset using crowdsourcing.<n>We show that our approaches are effective in generating elicitation questions, even with limited training data.
arXiv Detail & Related papers (2021-11-26T12:23:14Z) - Hone as You Read: A Practical Type of Interactive Summarization [6.662800021628275]
We present HARE, a new task where reader feedback is used to optimize document summaries for personal interest.
This task is related to interactive summarization, where personalized summaries are produced following a long feedback stage.
We propose to gather minimally-invasive feedback during the reading process to adapt to user interests and augment the document in real-time.
arXiv Detail & Related papers (2021-05-06T19:36:40Z) - Read what you need: Controllable Aspect-based Opinion Summarization of
Tourist Reviews [23.7107052882747]
We argue the need and propose a solution for generating personalized aspect-based opinion summaries from online tourist reviews.
We let our readers decide and control several attributes of the summary such as the length and specific aspects of interest.
Specifically, we take an unsupervised approach to extract coherent aspects from tourist reviews posted on TripAdvisor.
arXiv Detail & Related papers (2020-06-08T15:03:38Z) - Active Learning for Coreference Resolution using Discrete Annotation [76.36423696634584]
We improve upon pairwise annotation for active learning in coreference resolution.
We ask annotators to identify mention antecedents if a presented mention pair is deemed not coreferent.
In experiments with existing benchmark coreference datasets, we show that the signal from this additional question leads to significant performance gains per human-annotation hour.
arXiv Detail & Related papers (2020-04-28T17:17:11Z) - Retrospective Reader for Machine Reading Comprehension [90.6069071495214]
Machine reading comprehension (MRC) is an AI challenge that requires machine to determine the correct answers to questions based on a given passage.
When unanswerable questions are involved in the MRC task, an essential verification module called verifier is especially required in addition to the encoder.
This paper devotes itself to exploring better verifier design for the MRC task with unanswerable questions.
arXiv Detail & Related papers (2020-01-27T11:14:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.