Fugu-MT 論文翻訳(概要): Adding LLMs to the psycholinguistic norming toolbox: A practical guide to getting the most out of human ratings

論文の概要: Adding LLMs to the psycholinguistic norming toolbox: A practical guide to getting the most out of human ratings

arxiv url: http://arxiv.org/abs/2509.14405v1
Date: Wed, 17 Sep 2025 20:11:23 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-19 17:26:52.965437
Title: Adding LLMs to the psycholinguistic norming toolbox: A practical guide to getting the most out of human ratings
Title（参考訳）: 心理言語的規範化ツールボックスにLLMを追加する:人間の評価を最大限活用するための実践的ガイド
Authors: Javier Conde, María Grandury, Tairan Fu, Carlos Arriaga, Gonzalo Martínez, Thomas Clark, Sean Trott, Clarence Gerald Green, Pedro Reviriego, Marc Brysbaert,
Abstract要約: 本稿では,Large Language Models (LLM) を用いた単語特性推定手法を提案する。ガイドの主な重点は、人間の「金の標準」規範によるLCM生成データの検証である。また、当社の方法論を実装し、商用モデルとオープンウェイトモデルの両方をサポートするソフトウェアフレームワークを提案する。
参考スコア（独自算出の注目度）: 5.019061035507826
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Word-level psycholinguistic norms lend empirical support to theories of language processing. However, obtaining such human-based measures is not always feasible or straightforward. One promising approach is to augment human norming datasets by using Large Language Models (LLMs) to predict these characteristics directly, a practice that is rapidly gaining popularity in psycholinguistics and cognitive science. However, the novelty of this approach (and the relative inscrutability of LLMs) necessitates the adoption of rigorous methodologies that guide researchers through this process, present the range of possible approaches, and clarify limitations that are not immediately apparent, but may, in some cases, render the use of LLMs impractical. In this work, we present a comprehensive methodology for estimating word characteristics with LLMs, enriched with practical advice and lessons learned from our own experience. Our approach covers both the direct use of base LLMs and the fine-tuning of models, an alternative that can yield substantial performance gains in certain scenarios. A major emphasis in the guide is the validation of LLM-generated data with human "gold standard" norms. We also present a software framework that implements our methodology and supports both commercial and open-weight models. We illustrate the proposed approach with a case study on estimating word familiarity in English. Using base models, we achieved a Spearman correlation of 0.8 with human ratings, which increased to 0.9 when employing fine-tuned models. This methodology, framework, and set of best practices aim to serve as a reference for future research on leveraging LLMs for psycholinguistic and lexical studies.
Abstract（参考訳）: 単語レベルの精神言語学の規範は、言語処理の理論に実証的な支持を与えている。しかし、このような人為的措置は必ずしも実現可能あるいは容易であるとは限らない。 1つの有望なアプローチは、大きな言語モデル(LLM)を使用してこれらの特性を直接予測することで、人間の規範データセットを強化することである。しかし、このアプローチの新規性(およびLLMの相対的な不可解性)は、このプロセスを通じて研究者を導く厳密な方法論の採用を必要とし、可能なアプローチの範囲を提示し、すぐには明らかでないが、場合によってはLLMの使用を非現実的なものにする。本研究では,LLMを用いて単語の特徴を推定するための包括的方法論を提案する。提案手法は, ベースLLMの直接使用とモデル微調整の両方を対象とし, 特定のシナリオにおいて大幅な性能向上が期待できる代替手段である。ガイドの主な重点は、人間の「金の標準」規範によるLCM生成データの検証である。また、当社の方法論を実装し、商用モデルとオープンウェイトモデルの両方をサポートするソフトウェアフレームワークを提案する。提案手法は、英語における単語の親しみ度を推定するケーススタディで説明する。ベースモデルを用いて,スピアマンと人間の評価値との相関を0.8とし,微調整モデルを用いた場合,0.9に向上した。この方法論、枠組み、ベストプラクティスのセットは、精神言語学および語彙学研究にLLMを活用するための将来の研究の参考となることを目的としている。

論文の概要: Adding LLMs to the psycholinguistic norming toolbox: A practical guide to getting the most out of human ratings

関連論文リスト