Fugu-MT 論文翻訳(概要): A Methodological Guide on Using Large Language Models for Text Annotation in the Social Sciences and Humanities with Python and R

論文の概要: A Methodological Guide on Using Large Language Models for Text Annotation in the Social Sciences and Humanities with Python and R

arxiv url: http://arxiv.org/abs/2604.09638v1
Date: Sat, 21 Mar 2026 00:09:50 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-26 09:01:57.227472
Title: A Methodological Guide on Using Large Language Models for Text Annotation in the Social Sciences and Humanities with Python and R
Title（参考訳）: Python と R を用いた社会科学・人文科学におけるテキストアノテーションのための大規模言語モデルの利用に関する方法論ガイド
Authors: Qixiang Fang, Javier Garcia Bernardo, Erik-Jan van Kesteren,
Abstract要約: 大規模言語モデル(LLM)は、社会科学や人文科学研究者にとって不可欠なツールとなっている。本稿では,SSH 研究におけるテキストアノテーションに LLM を使用するための包括的,ステップバイステップの方法論的ガイドを提供する。
参考スコア（独自算出の注目度）: 1.1372969798040315
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) have become an essential tool for social science and humanities (SSH) researchers who work with text. One particularly valuable application is automating text annotation, a traditionally time-consuming step in preparing data for empirical analysis. Yet many SSH researchers face two challenges: getting started with LLMs and understanding how to address their limitations. Practically, the rapid pace of model development can make LLMs seem inaccessible or intimidating, while even experienced users may overlook how annotation errors can bias downstream statistical analyses (e.g., regression estimates and $p$-values), even when annotation accuracy appears high. This paper provides a comprehensive, step-by-step methodological guide for using LLMs for text annotation in SSH research, with clear Python and R code snippets. We cover (1) how LLMs work and what they can and cannot do; (2) how to identify an LLM-suitable research project and establish minimum data and computational requirements; (3) how to design prompts and run annotation tasks; (4) how to evaluate annotation quality and iteratively refine prompts without overfitting; (5) how to integrate LLM annotations into downstream statistical analyses while accounting for annotation error; and (6) how to manage cost, efficiency, and reproducibility when scaling up annotation. Throughout, we provide intuitive methodological reasoning, concrete examples, code snippets, and best-practice guidance to help researchers confidently and transparently incorporate LLM-based annotation into their scientific workflows.
Abstract（参考訳）: 大規模言語モデル(LLM)は、テキストを扱う社会科学と人文科学(SSH)研究者にとって不可欠なツールとなっている。特に価値のある応用の1つは、経験分析のためにデータを作成するのに伝統的に時間がかかるテキストアノテーションの自動化である。しかし、多くのSSH研究者は、2つの課題に直面している。実際に、モデル開発の急速なペースは、LCMがアクセス不能または脅威に思える一方で、経験豊富なユーザでさえ、アノテーションの誤差が下流の統計分析(例えば、回帰推定値や$p$-values)に偏っているかを見落としてしまう可能性がある。本稿では,SSH 研究におけるテキストアノテーションに LLM を使用するための,Python と R の明確なコードスニペットを用いた,包括的でステップバイステップの方法論的ガイドを提供する。 1) LLMがどのように機能し,何が可能で,何ができないのか,(2) LLMに適した研究プロジェクトを特定し,最小限のデータと計算要件を確立する方法,(3) アノテーションタスクの設計と実行方法,(4) 過剰に適合せずにアノテーションの品質を評価し,反復的にプロンプトを洗練する方法,(5) アノテーションエラーを考慮した下流の統計分析にLLMアノテーションを統合する方法,(6) アノテーションのスケールアップ時のコスト,効率,再現性を管理する方法などについて述べる。本研究は,LSMに基づくアノテーションを科学的ワークフローに確実に,透過的に組み込むための,直感的な方法論的推論,具体例,コードスニペット,ベストプラクティスガイダンスを提供する。

論文の概要: A Methodological Guide on Using Large Language Models for Text Annotation in the Social Sciences and Humanities with Python and R

関連論文リスト