Fugu-MT 論文翻訳(概要): Simulating Students or Sycophantic Problem Solving? On Misconception Faithfulness of LLM Simulators

論文の概要: Simulating Students or Sycophantic Problem Solving? On Misconception Faithfulness of LLM Simulators

arxiv url: http://arxiv.org/abs/2605.12748v1
Date: Tue, 12 May 2026 20:55:23 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-14 23:30:27.687255
Title: Simulating Students or Sycophantic Problem Solving? On Misconception Faithfulness of LLM Simulators
Title（参考訳）: 学生のシミュレーションとサイコファンティック問題解決 : LLMシミュレータの誤認識について
Authors: Heejin Do, Shashank Sonkar, Mrinmaya Sachan,
Abstract要約: 大規模言語モデル(LLM)は、生徒のような反応を流線型に生成できるため、AI教師や人間教育者のトレーニングや評価のための模擬学生として魅力的である。しかし、このようなシミュレータは、実際の学生と出力の類似性によって評価され、相互作用中に一貫性のある誤解を持つ学生のように振る舞うかどうかによって評価される。シミュレーションが誤解駆動の信念状態を維持しているかどうかを判断し、フィードバックが誤解に対処した場合に選択的に更新する。
参考スコア（独自算出の注目度）: 55.617099475539305
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) can fluently generate student-like responses, making them attractive as simulated students for training and evaluating AI tutors and human educators. Yet such simulators are typically evaluated by output similarity to real students, not by whether they behave like students with coherent misconceptions during interaction. We introduce a controlled framework for evaluating misconception faithfulness, whether a simulator maintains a misconception-driven belief state and updates selectively when feedback addresses the underlying misconception. Central to our framework is a misconception-contrastive feedback protocol that compares targeted feedback against two controls: misaligned feedback (targeting a different but plausible misconception) and generic feedback (only identifying answer is wrong). We propose Selective Flip Score (SFS), which quantifies how much more often a simulator flips its answer under targeted feedback than under contrastive controls. Across seven LLMs (4B-120B), multiple datasets, and prompting strategies, simulators exhibit near-zero SFS, correcting their answers at similarly high rates regardless of feedback relevance. Further analyses reveal a sycophantic failure mode: models behave less like students with misconceptions but more like problem-solvers who treat any corrective signal as a cue to abandon the simulated belief and re-solve from internal knowledge. To address this, we develop a post-training pipeline spanning supervised fine-tuning (SFT), preference optimization, and reinforcement learning (RL) with an SFS-aligned reward; SFT yields notable gains up to +0.56, and SFS-aligned RL provides more consistent improvements than preference optimization. Our results establish misconception faithfulness as a challenging yet trainable property, motivating a shift from static output matching toward interactive, belief-aware student modeling.
Abstract（参考訳）: 大規模言語モデル(LLM)は、生徒のような反応を流線型に生成できるため、AI教師や人間教育者のトレーニングや評価のための模擬学生として魅力的である。しかし、このようなシミュレータは、実際の学生と出力の類似性によって評価され、相互作用中に一貫性のある誤解を持つ学生のように振る舞うかどうかによって評価される。シミュレーションが誤解駆動の信念状態を維持しているかどうかを判断し,フィードバックが誤解に対処した場合に選択的に更新する,誤解の忠実さを評価するための制御フレームワークを提案する。私たちのフレームワークの中心となるのは、目標とするフィードバックと、不一致のフィードバック(異なるが妥当な誤解をターゲットとする)と一般的なフィードバック(答えの特定だけが間違っている)の2つのコントロールを比較した、誤解に反するフィードバックプロトコルです。本研究では,Selective Flip Score (SFS) を提案する。 7つのLLM(4B-120B)、複数のデータセット、およびプロンプト戦略、シミュレータは、ほぼゼロに近いSFSを示し、フィードバックの関連性に関係なく、同様に高いレートで回答を補正する。モデルは誤解を持つ学生のように振る舞うのではなく、シミュレートされた信念を放棄し、内部知識から解決するためのキューとして、いかなる補正信号も扱う問題解決者のように振る舞う。これを解決するために、教師付き微調整(SFT)、選好最適化、強化学習(RL)にSFS対応の報酬を与えるポストトレーニングパイプラインを開発し、SFTは+0.56まで、SFS対応のRLは優先最適化よりも一貫した改善を提供する。本研究は,静的なアウトプットマッチングから,対話的,信念を意識した学生モデリングへのシフトを動機として,誤解の忠実さを困難なトレーニング可能な特性として確立した。

論文の概要: Simulating Students or Sycophantic Problem Solving? On Misconception Faithfulness of LLM Simulators

関連論文リスト