Fugu-MT 論文翻訳(概要): CLAWS:Creativity detection for LLM-generated solutions using Attention Window of Sections

論文の概要: CLAWS:Creativity detection for LLM-generated solutions using Attention Window of Sections

arxiv url: http://arxiv.org/abs/2510.17921v1
Date: Mon, 20 Oct 2025 06:59:37 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-25 03:08:12.350821
Title: CLAWS:Creativity detection for LLM-generated solutions using Attention Window of Sections
Title（参考訳）: CLAWS:Creativity Detection for LLM- generated Solution using Attention Windows of Sections
Authors: Keuntae Kim, Eunhye Jeong, Sehyeon Lee, Seohee Yoon, Yong Suk Choi,
Abstract要約: 本研究では,数学的な解を,人間の評価を伴わない典型的・創造的・幻覚的カテゴリーに定義・分類する手法であるCLAWSを提案する。 181個の数学コンテストから収集した4545個の数学問題に対してCLAWSを検証した。
参考スコア（独自算出の注目度）: 2.1041384320978267
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent advances in enhancing the reasoning ability of large language models (LLMs) have been remarkably successful. LLMs trained with reinforcement learning (RL) for reasoning demonstrate strong performance in challenging tasks such as mathematics and coding, even with relatively small model sizes. However, despite these improvements in task accuracy, the assessment of creativity in LLM generations has been largely overlooked in reasoning tasks, in contrast to writing tasks. The lack of research on creativity assessment in reasoning primarily stems from two challenges: (1) the difficulty of defining the range of creativity, and (2) the necessity of human evaluation in the assessment process. To address these challenges, we propose CLAWS, a method that defines and classifies mathematical solutions into typical, creative, and hallucinated categories without human evaluation, by leveraging attention weights across prompt sections and output. CLAWS outperforms five existing white-box detection methods (Perplexity, Logit Entropy, Window Entropy, Hidden Score, and Attention Score) on five 7-8B math RL models (DeepSeek, Qwen, Mathstral, OpenMath2, and Oreal). We validate CLAWS on 4545 math problems collected from 181 math contests (AJHSME, AMC, AIME).
Abstract（参考訳）: 近年,大規模言語モデル(LLM)の推論能力の向上が目覚ましい成果を上げている。強化学習(RL)を用いて学習したLLMは,比較的小さなモデルサイズであっても,数学やコーディングといった課題において高い性能を示す。しかし、これらのタスク精度の改善にもかかわらず、LCM世代における創造性の評価は、タスクを書くのとは対照的に、推論タスクにおいてほとんど見過ごされてきた。推論における創造性評価に関する研究の欠如は,(1)創造性の範囲を定義することの難しさ,(2)評価過程における人的評価の必要性の2つの課題に起因している。これらの課題に対処するために,我々は,数理解を人間の評価を伴わない,典型的な,創造的で,幻覚的なカテゴリに定義・分類する手法であるCLAWSを提案する。 CLAWSは7-8Bの数学RLモデル(DeepSeek、Qwen、Mathstral、OpenMath2、Oreal)で、既存の5つのホワイトボックス検出方法(Perplexity、Logit Entropy、Window Entropy、Hidden Score、Attention Score)を上回っている。我々は181の算数コンテスト(AJHSME, AMC, AIME)から収集した4545の算数問題に対してCLAWSを検証した。

論文の概要: CLAWS:Creativity detection for LLM-generated solutions using Attention Window of Sections

関連論文リスト