Fugu-MT 論文翻訳(概要): What am I missing here?: Evaluating Large Language Models for Masked Sentence Prediction

論文の概要: What am I missing here?: Evaluating Large Language Models for Masked Sentence Prediction

arxiv url: http://arxiv.org/abs/2508.07702v1
Date: Mon, 11 Aug 2025 07:25:50 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-12 21:23:28.989403
Title: What am I missing here?: Evaluating Large Language Models for Masked Sentence Prediction
Title（参考訳）: ここで何が欠けているのか:マスケ文予測のための大規模言語モデルの評価
Authors: Charlie Wyatt, Aditya Joshi, Flora Salim,
Abstract要約: Next Token Prediction (NTP)は、モデルを事前計画したり、長距離コヒーレンスを維持する能力を制限する。マスケ文予測(MSP)における3つの商用LCMの評価我々の重要な発見は、商業LLMが、他のタスクにおいて最上級のパフォーマンスにもかかわらず、低構造領域におけるマスキング文の予測に不適であることを明らかにしている。
参考スコア（独自算出の注目度）: 2.8514881296685113
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Transformer-based models primarily rely on Next Token Prediction (NTP), which predicts the next token in a sequence based on the preceding context. However, NTP's focus on single-token prediction often limits a model's ability to plan ahead or maintain long-range coherence, raising questions about how well LLMs can predict longer contexts, such as full sentences within structured documents. While NTP encourages local fluency, it provides no explicit incentive to ensure global coherence across sentence boundaries-an essential skill for reconstructive or discursive tasks. To investigate this, we evaluate three commercial LLMs (GPT-4o, Claude 3.5 Sonnet, and Gemini 2.0 Flash) on Masked Sentence Prediction (MSP) - the task of infilling a randomly removed sentence - from three domains: ROCStories (narrative), Recipe1M (procedural), and Wikipedia (expository). We assess both fidelity (similarity to the original sentence) and cohesiveness (fit within the surrounding context). Our key finding reveals that commercial LLMs, despite their superlative performance in other tasks, are poor at predicting masked sentences in low-structured domains, highlighting a gap in current model capabilities.
Abstract（参考訳）: トランスフォーマーベースのモデルは、主にNext Token Prediction (NTP)に依存し、前のコンテキストに基づいてシーケンス内の次のトークンを予測する。しかしながら、NTPがシングルトークン予測にフォーカスすることは、モデルが事前計画したり、長距離コヒーレンスを維持する能力を制限することが少なく、構造化文書内の全文など、LLMがいかに長いコンテキストを予測できるかという疑問を提起する。 NTPは局所流布を奨励するが、文境界を越えたグローバルコヒーレンスを確保するための明示的なインセンティブは提供されない。そこで我々は,3つの領域,ROCStories (ナラティブ), Recipe1M (プロデューラル), Wikipedia (エビデンス) の3つの商用LCM (GPT-4o, Claude 3.5 Sonnet, Gemini 2.0 Flash) をMasked Sentence Prediction (MSP) 上で評価した。文の忠実度(原文と類似性)と密着度(周囲の文脈に適合)の両方を評価する。我々の重要な発見は、商業LLMは、他のタスクでは最上位のパフォーマンスにもかかわらず、低構造領域におけるマスキング文の予測が不十分であることを示し、現在のモデル能力のギャップを浮き彫りにしている。

論文の概要: What am I missing here?: Evaluating Large Language Models for Masked Sentence Prediction

関連論文リスト