Fugu-MT 論文翻訳(概要): Death of the Novel(ty): Beyond n-Gram Novelty as a Metric for Textual Creativity

論文の概要: Death of the Novel(ty): Beyond n-Gram Novelty as a Metric for Textual Creativity

arxiv url: http://arxiv.org/abs/2509.22641v1
Date: Fri, 26 Sep 2025 17:59:05 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-29 20:57:54.637603
Title: Death of the Novel(ty): Beyond n-Gram Novelty as a Metric for Textual Creativity
Title（参考訳）: 小説の死 : テクスチュアル・クリエイティビティの指標としてのn-グラムノベルティを超えて
Authors: Arkadiy Saakyan, Najoung Kim, Smaranda Muresan, Tuhin Chakrabarty,
Abstract要約: N-gramノベルティは、トレーニングデータ以外のテキストを生成する言語モデルの能力を評価するために広く利用されている。我々は,この創造性の概念とn-gramの新規性との関係を,人間とAIが生成したテキストの密読を通して検討する。我々は,n-gramの新規性は,専門家が判断する創造性と肯定的に関連しているのに対し,n-gramの新規性によるトップクァアタイル表現の91%は創造性とは判断されないことがわかった。
参考スコア（独自算出の注目度）: 29.58419742230708
License: http://creativecommons.org/licenses/by/4.0/
Abstract: N-gram novelty is widely used to evaluate language models' ability to generate text outside of their training data. More recently, it has also been adopted as a metric for measuring textual creativity. However, theoretical work on creativity suggests that this approach may be inadequate, as it does not account for creativity's dual nature: novelty (how original the text is) and appropriateness (how sensical and pragmatic it is). We investigate the relationship between this notion of creativity and n-gram novelty through 7542 expert writer annotations (n=26) of novelty, pragmaticality, and sensicality via close reading of human and AI-generated text. We find that while n-gram novelty is positively associated with expert writer-judged creativity, ~91% of top-quartile expressions by n-gram novelty are not judged as creative, cautioning against relying on n-gram novelty alone. Furthermore, unlike human-written text, higher n-gram novelty in open-source LLMs correlates with lower pragmaticality. In an exploratory study with frontier close-source models, we additionally confirm that they are less likely to produce creative expressions than humans. Using our dataset, we test whether zero-shot, few-shot, and finetuned models are able to identify creative expressions (a positive aspect of writing) and non-pragmatic ones (a negative aspect). Overall, frontier LLMs exhibit performance much higher than random but leave room for improvement, especially struggling to identify non-pragmatic expressions. We further find that LLM-as-a-Judge novelty scores from the best-performing model were predictive of expert writer preferences.
Abstract（参考訳）: N-gramノベルティは、トレーニングデータ以外のテキストを生成する言語モデルの能力を評価するために広く利用されている。最近では、テキストの創造性を測定する指標としても採用されている。しかし、創造性に関する理論的研究は、この方法が創造性の二重性(本文がいかに独創的であるか)と適切性(いかに巧妙で実用的であるか)を考慮しないため、この方法が不十分である可能性を示唆している。我々は,この創造性の概念とn-gramノベルティの関係を,人間とAIが生成したテキストの密読を通して,ノベルティ,実用性,感性といった7542専門家の注釈(n=26)を通して検討する。我々は,n-gramの新規性は,専門家が判断する創造性と肯定的に結びついているが,n-gramの新規性によるトップクァアタイル表現の91%は,n-gramの新規性のみに頼ることへの警告として,創造的ではないことに気付く。さらに、人文テキストとは異なり、オープンソースのLLMにおける高いn-gramの新規性は、実用性に相関する。また、フロンティアのクローズソースモデルを用いた探索的研究において、人間よりも創造的な表現を産み出す可能性が低いことも確認した。データセットを用いて、ゼロショット、少数ショット、微調整モデルが創造的表現(書き込みの肯定的な側面)と非実用的表現(否定的な側面)を識別できるかどうかをテストする。全体として、フロンティアのLLMは、ランダムよりもはるかに高い性能を示すが、特に非実用的表現の特定に苦慮している。さらに, LLM-as-a-Judgeノベルティスコアは, 優れた評価モデルから, 専門家の好みを予測できることがわかった。

論文の概要: Death of the Novel(ty): Beyond n-Gram Novelty as a Metric for Textual Creativity

関連論文リスト