Fugu-MT 論文翻訳(概要): When Offline Selectors Cannot Beat the Best Single Model: A Diagnostic Study on edX Dropout Prediction

論文の概要: When Offline Selectors Cannot Beat the Best Single Model: A Diagnostic Study on edX Dropout Prediction

arxiv url: http://arxiv.org/abs/2606.04161v1
Date: Tue, 02 Jun 2026 19:24:09 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-04 20:44:18.347147
Title: When Offline Selectors Cannot Beat the Best Single Model: A Diagnostic Study on edX Dropout Prediction
Title（参考訳）: オフラインセレクタが最高の単一モデルに勝てない場合:EDXドロップアウト予測に関する診断的研究
Authors: Tyler Crosse, Alan Nadelsticher Ruvalcaba, Dustin Khang LeDuc, Thomas Trask, Nicholas Lytle, David Joyner,
Abstract要約: 記録されたデータから訓練されたセレクタは常に最強の予測器を打ち負かさない 3段階の診断は、共有バッファ上でそれらを規定する。次のイテレーションでは、オフラインの学習者をチューニングすることなく、状態を変更したり、新しいデータを集めたりする必要がある。
参考スコア（独自算出の注目度）: 0.35185044688786976
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Different predictors often excel on different inputs, so picking the best one per instance promises higher accuracy than committing to a single model. In practice, selectors trained from logged data routinely fail to beat the strongest single predictor. Three causes typically go unseparated before more tuning is applied: a mismatched learner, a state that does not predict which model wins, or buffer-to-deployment label shift. A three-stage diagnostic rules them out on a shared buffer. Stage~1 estimates a local ceiling on oracle recovery from $k$-NN label consistency. Stage~2 asks whether paired BC and offline-RL learners (BC, DQN, and CQL across penalty weights) reach that ceiling. Stage~3 ablates the selector state to test whether richer features would raise it. The combined verdict points to the most promising next step: tuning the learner, redesigning the state, or collecting new data. We apply it to selecting among five dropout-prediction models on edX clickstream data. Across 16 windows, the oracle beats the strongest single base model by 9.7 accuracy points on average, yet BC, DQN, and CQL land in the same test-accuracy band below it (robust to a tenfold buffer sweep and $N{=}2{,}000$ held-out examples). The bottleneck is local representational ambiguity: CQL closes the imitation gap without a deployment gain (not conservatism), regret clusters tightly across learners (not tie-breaking), and the three learners converge on test accuracy (not shift). The next iteration should change the state or collect new data, not tune the offline learner further.
Abstract（参考訳）: 異なる予測器は、しばしば異なる入力に精通するので、インスタンス毎に最高の入力を選択することは、単一のモデルにコミットするよりも高い精度を約束する。実際には、ログされたデータからトレーニングされたセレクタは、最も強い単一の予測器を正常に破ることができない。ミスマッチした学習者、どのモデルが勝つかを予測しない状態、バッファからデプロイまでのラベルシフトである。 3段階の診断は、共有バッファ上でそれらを規定する。 Stage~1は、$k$-NNラベルの一貫性からオラクルリカバリのローカル天井を推定する。ステージ~2では、BCとオフラインRL学習者(ペナルティウェイトを越えたBC、DQN、CQL)が天井に到達するかどうかを問う。ステージ~3はセレクタステートを宣言し、よりリッチなフィーチャが上昇するかどうかをテストする。統合された判断は、学習者のチューニング、状態の再設計、新しいデータ収集といった、最も有望な次のステップを示している。 edX クリックストリームデータ上での5つのドロップアウト予測モデルの選択に適用する。 16ウィンドウにわたって、オラクルは平均9.7の精度で最強の単一ベースモデルを打ち負かすが、BC、DQN、CQLは同じテスト精度のバンドに着陸する(10倍バッファスイープと$N{=}2{,}000$保留例)。ボトルネックは、局所的な表現の曖昧さである: CQLは、デプロイメントの利得(保守主義ではない)、学習者間での後悔のクラスター(ネクタイブレークではない)、そして3人の学習者がテスト精度(シフトではない)に収束する。次のイテレーションでは、オフラインの学習者をチューニングすることなく、状態を変更したり、新しいデータを集めたりする必要がある。

論文の概要: When Offline Selectors Cannot Beat the Best Single Model: A Diagnostic Study on edX Dropout Prediction

関連論文リスト