Fugu-MT 論文翻訳(概要): Toward LLM-Supported Automated Assessment of Critical Thinking Subskills

論文の概要: Toward LLM-Supported Automated Assessment of Critical Thinking Subskills

arxiv url: http://arxiv.org/abs/2510.12915v1
Date: Tue, 14 Oct 2025 18:36:19 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-16 20:13:28.384902
Title: Toward LLM-Supported Automated Assessment of Critical Thinking Subskills
Title（参考訳）: 批判的思考サブスキルの自動評価に向けて
Authors: Marisa C. Peczuh, Nischal Ashok Kumar, Ryan Baker, Blair Lehman, Danielle Eisenberg, Caitlin Mills, Keerthi Chebrolu, Sudhip Nashi, Cadence Young, Brayden Liu, Sherry Lachman, Andrew Lan,
Abstract要約: 批判的思考の根底にある「サブスキル」の測定の可能性を検討する。我々は,学生エッセイのコーパスのための,確立したスキル進歩と完全な人間のコーディングに基づくコーディングルーリックを開発する。我々は、ゼロショットプロンプト、少数ショットプロンプト、教師付き微調整の3つの異なる自動スコアリングアプローチを評価した。
参考スコア（独自算出の注目度）: 0.7768012939205664
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Critical thinking represents a fundamental competency in today's education landscape. Developing critical thinking skills through timely assessment and feedback is crucial; however, there has not been extensive work in the learning analytics community on defining, measuring, and supporting critical thinking. In this paper, we investigate the feasibility of measuring core "subskills" that underlie critical thinking. We ground our work in an authentic task where students operationalize critical thinking: student-written argumentative essays. We developed a coding rubric based on an established skills progression and completed human coding for a corpus of student essays. We then evaluated three distinct approaches to automated scoring: zero-shot prompting, few-shot prompting, and supervised fine-tuning, implemented across three large language models (GPT-5, GPT-5-mini, and ModernBERT). GPT-5 with few-shot prompting achieved the strongest results and demonstrated particular strength on subskills with separable, frequent categories, while lower performance was observed for subskills that required detection of subtle distinctions or rare categories. Our results underscore critical trade-offs in automated critical thinking assessment: proprietary models offer superior reliability at higher cost, while open-source alternatives provide practical accuracy with reduced sensitivity to minority categories. Our work represents an initial step toward scalable assessment of higher-order reasoning skills across authentic educational contexts.
Abstract（参考訳）: 批判的思考は、今日の教育のランドスケープにおける基本的な能力を表している。タイムリーな評価とフィードバックによる批判的思考スキルの育成は重要であるが、批判的思考の定義、測定、支援については、学習分析コミュニティにおいて広範な研究は行われていない。本稿では,批判的思考の根底にある「サブスキル」の測定の可能性について検討する。我々は、学生が批判的思考を運用する真正のタスク、すなわち学生が書いた議論的エッセイに、我々の研究を基礎づける。我々は、確立されたスキル進歩に基づくコーディングルーリックを開発し、学生エッセイのコーパスのための人間のコーディングを完成させた。次に、ゼロショットプロンプト、少数ショットプロンプト、教師付き微調整の3つの異なるアプローチ(GPT-5、GPT-5-mini、ModernBERT)を評価した。 GPT-5は, 微妙な区別や稀なカテゴリーの発見を必要とするサブスキルに対して, 分離可能な, 頻繁なカテゴリーのサブスキルに対して, 特定の強度を示した。プロプライエタリなモデルは高いコストで優れた信頼性を提供する一方、オープンソースの代替手段はマイノリティなカテゴリーに対する感度を低下させる実用的な精度を提供する。我々の研究は、真正な教育状況における高次推論スキルのスケーラブルな評価に向けた最初のステップである。

論文の概要: Toward LLM-Supported Automated Assessment of Critical Thinking Subskills

関連論文リスト