Fugu-MT 論文翻訳(概要): Spoiler Alert: Narrative Forecasting as a Metric for Tension in LLM Storytelling

論文の概要: Spoiler Alert: Narrative Forecasting as a Metric for Tension in LLM Storytelling

arxiv url: http://arxiv.org/abs/2604.09854v1
Date: Fri, 10 Apr 2026 19:33:43 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-14 20:13:15.711574
Title: Spoiler Alert: Narrative Forecasting as a Metric for Tension in LLM Storytelling
Title（参考訳）: スポイラーアラート: LLMストーリーテリングにおけるテンションの指標としての物語予測
Authors: Peiqi Sui, Yutong Zhu, Tianyi Cheng, Peter West, Richard Jean So, Hoyt Long, Ari Holtzman,
Abstract要約: 我々は、既存のルーブリックは説得力のある人間の物語の重要な次元である物語の緊張を見落としていると論じる。本稿では,100-Endingsメートル法について紹介する。ルーリックベースの審査員とは異なり、100-EndingsはニューヨーカーのストーリーをLSMのアウトプットよりはるかに上位にランク付けしている。
参考スコア（独自算出の注目度）: 15.25806708314033
License: http://creativecommons.org/licenses/by/4.0/
Abstract: LLMs have so far failed both to generate consistently compelling stories and to recognize this failure--on the leading creative-writing benchmark (EQ-Bench), LLM judges rank zero-shot AI stories above New Yorker short stories, a gold standard for literary fiction. We argue that existing rubrics overlook a key dimension of compelling human stories: narrative tension. We introduce the 100-Endings metric, which walks through a story sentence by sentence: at each position, a model predicts how the story will end 100 times given only the text so far, and we measure tension as how often predictions fail to match the ground truth. Beyond the mismatch rate, the sentence-level curve yields complementary statistics, such as inflection rate, a geometric measure of how frequently the curve reverses direction, tracking twists and revelations. Unlike rubric-based judges, 100-Endings correctly ranks New Yorker stories far above LLM outputs. Grounded in narratological principles, we design a story-generation pipeline using structural constraints, including analysis of story templates, idea formulation, and narrative scaffolding. Our pipeline significantly increases narrative tension as measured by the 100-Endings metric, while maintaining performance on the EQ-Bench leaderboard.
Abstract（参考訳）: LLMは、一貫して魅力的なストーリーを生成し、この失敗を認識するのに失敗している。主要なクリエイティブ・ライティング・ベンチマーク(EQ-Bench)において、LLMは、小説のゴールドスタンダードであるニューヨーク・ショートストーリーの上位にゼロショットAIストーリーをランク付けしている。我々は、既存のルーブリックは説得力のある人間の物語の重要な次元である物語の緊張を見落としていると論じる。本稿では,100-Endingsメートル法について述べる。各位置において,これまでテキストのみに与えられた100回のストーリー終了をモデルが予測し,その予測が真実に一致しない頻度としてテンションを計測する。ミスマッチ率の他に、文レベルの曲線は、傾き率、曲線がどれだけの頻度で方向を逆転するかの幾何的測度、ねじれや解答などの相補的な統計が得られる。ルーリックベースの審査員とは異なり、100-EndingsはニューヨーカーのストーリーをLSMのアウトプットよりはるかに上位にランク付けしている。ナラトロジーの原則を基礎として,物語テンプレートの分析,アイデアの定式化,物語の足場化など,構造的制約を用いた物語生成パイプラインを設計する。パイプラインは,EQ-Benchリーダーボードの性能を維持しながら,100-Endings測定値で測定したナラティブテンションを著しく向上させる。

論文の概要: Spoiler Alert: Narrative Forecasting as a Metric for Tension in LLM Storytelling

関連論文リスト