Fugu-MT 論文翻訳(概要): Evaluating Reasoning-Based Scaffolds for Human-AI Co-Annotation: The ReasonAlign Annotation Protocol

論文の概要: Evaluating Reasoning-Based Scaffolds for Human-AI Co-Annotation: The ReasonAlign Annotation Protocol

arxiv url: http://arxiv.org/abs/2603.21094v1
Date: Sun, 22 Mar 2026 07:14:27 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-24 19:11:39.233832
Title: Evaluating Reasoning-Based Scaffolds for Human-AI Co-Annotation: The ReasonAlign Annotation Protocol
Title（参考訳）: ReasonAlign Annotation Protocol
Authors: Smitha Muthya Sudheendra, Jaideep Srivastava,
Abstract要約: ReasonAlignは推論に基づくアノテーションの足場で、予測されたラベルを保ちながらモデル生成の説明を公開します。我々はこれを、アノテーションの精度を完全に評価するのではなく、推論が人間のアノテーションの振る舞いにどのように影響するかの制御された研究として捉えている。以上の結果から, 推論への露出は, 最小限の修正とともに, 合意の増大に結びついていることが示唆された。
参考スコア（独自算出の注目度）: 2.5819252531158683
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Human annotation is central to NLP evaluation, yet subjective tasks often exhibit substantial variability across annotators. While large language models (LLMs) can provide structured reasoning to support annotation, their influence on human annotation behavior remains unclear. We introduce ReasonAlign, a reasoning-based annotation scaffold that exposes LLM-generated explanations while withholding predicted labels. We frame this as a controlled study of how reasoning affects human annotation behavior, rather than a full evaluation of annotation accuracy. Using a two-pass protocol inspired by Delphi-style revision, annotators first label instances independently and then revise their decisions after viewing model-generated reasoning. We evaluate the approach on sentiment classification and opinion detection tasks, analyzing changes in inter-annotator agreement and revision behavior. To quantify these effects, we introduce the Annotator Effort Proxy (AEP), a metric capturing the proportion of labels revised after exposure to reasoning. Our results show that exposure to reasoning is associated with increased agreement alongside minimal revision, suggesting that reasoning primarily helps resolve ambiguous cases without inducing widespread changes. These findings provide insight into how reasoning explanations shape annotation consistency and highlight reasoning-based scaffolds as a practical mechanism for supporting human-AI annotation workflows.
Abstract（参考訳）: 人間のアノテーションはNLP評価の中心であるが、主観的なタスクはアノテータ間で大きな変動を示すことが多い。大きな言語モデル(LLM)は、アノテーションをサポートする構造的推論を提供することができるが、人間のアノテーション行動への影響は未だ不明である。 ReasonAlignは推論に基づくアノテーションの足場で、予測されたラベルを保ちながらLLM生成の説明を公開します。我々はこれを、アノテーションの精度を完全に評価するのではなく、推論が人間のアノテーションの振る舞いにどのように影響するかの制御された研究として捉えている。 Delphiスタイルのリビジョンにインスパイアされた2パスプロトコルを使用して、アノテータはまず独立してインスタンスをラベル付けし、モデル生成の推論を見た後にその決定を更新する。我々は、感情分類と意見検出タスクに対するアプローチを評価し、アノテーション間の合意や修正行動の変化を分析した。これらの効果を定量化するために,アノテータEffort Proxy (AEP) を導入する。本研究の結果から, 推論への露出は, 最小限の修正と一致し, 広範囲な変更を伴わずとも, 曖昧な症例の解決に有効であることが示唆された。これらの知見は,人間のAIアノテーションワークフローを支援するための実践的なメカニズムとして,推論説明の整合性や推論に基づく足場の強調について考察する。

論文の概要: Evaluating Reasoning-Based Scaffolds for Human-AI Co-Annotation: The ReasonAlign Annotation Protocol

関連論文リスト