Fugu-MT 論文翻訳(概要): DocReward: A Document Reward Model for Structuring and Stylizing

論文の概要: DocReward: A Document Reward Model for Structuring and Stylizing

arxiv url: http://arxiv.org/abs/2510.11391v1
Date: Mon, 13 Oct 2025 13:36:32 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-14 18:06:30.378539
Title: DocReward: A Document Reward Model for Structuring and Stylizing
Title（参考訳）: DocReward: 構造化とスティル化のためのドキュメントリワードモデル
Authors: Junpeng Liu, Yuzhong Zhao, Bowen Cao, Jiayu Ding, Yilin Jia, Tengchao Lv, Yupan Huang, Shaohan Huang, Nan Yang, Li Dong, Lei Cui, Tao Ge, Xun Wang, Huitian Jiao, Sun Mao, FNU Kartik, Si-Qing Chen, Wai Lam, Furu Wei,
Abstract要約: DocRewardはドキュメントの構造とスタイルに基づいてドキュメントを評価するドキュメント報酬モデルである。ブラッドリー・テリーの損失を利用して文書を採点し、注釈付きランキングに矛盾する予測を罰する訓練を受けている。 GPT-5の37.7%の勝利率に比べて60.8%の勝利率を達成した。
参考スコア（独自算出の注目度）: 107.03974018371058
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advances in agentic workflows have enabled the automation of tasks such as professional document generation. However, they primarily focus on textual quality, neglecting visual structure and style, which are crucial for readability and engagement. This gap arises mainly from the absence of suitable reward models to guide agentic workflows toward producing documents with stronger structural and stylistic quality. To address this, we propose DocReward, a document reward model that evaluates documents based on their structure and style. We construct a multi-domain dataset DocPair of 117K paired documents, covering 32 domains and 267 document types, each including a high- and low-professionalism document with identical content but different structure and style. This enables the model to evaluate professionalism comprehensively, and in a textual-quality-agnostic way. DocReward is trained using the Bradley-Terry loss to score documents, penalizing predictions that contradict the annotated ranking. To assess the performance of reward models, we create a test dataset containing document bundles ranked by well-educated human evaluators. Notably, DocReward outperforms GPT-4o and GPT-5 in accuracy by 30.6 and 19.4 percentage points, respectively, demonstrating its superiority over baselines. In an extrinsic evaluation of document generation, DocReward achieves a significantly higher win rate of 60.8%, compared to GPT-5's 37.7% win rate, demonstrating its utility in guiding generation agents toward producing human-preferred documents.
Abstract（参考訳）: エージェントワークフローの最近の進歩は、プロのドキュメント生成のようなタスクの自動化を可能にしている。しかし、それらは主に、読みやすさとエンゲージメントに不可欠である視覚構造とスタイルを無視して、テキストの品質に重点を置いている。このギャップは主として、エージェントワークフローをガイドする適切な報酬モデルがないことから生じ、より強い構造的・スタイリスティックな品質の文書を作成する。そこで本稿では,ドキュメントの構造とスタイルに基づいて文書を評価するドキュメント報酬モデルであるDocRewardを提案する。我々は、32のドメインと267のドキュメントタイプをカバーし、117Kのペア化されたドキュメントからなるマルチドメインデータセットDocPairを構築した。これにより、モデルがプロフェッショナル主義を包括的に、そしてテキスト品質に依存しない方法で評価することができる。 DocRewardはBradley-Terryの損失を利用して文書をスコアし、注釈付きランキングに矛盾する予測を罰する訓練を行っている。報奨モデルの性能を評価するために、よく教育された人間評価者によってランク付けされた文書バンドルを含むテストデータセットを作成する。 DocReward は GPT-4o と GPT-5 をそれぞれ 30.6 と 19.4 の精度で上回り、ベースラインよりも優れていた。 GPT-5の37.7%の勝利率と比較して、DocRewardは文書生成の根本的評価において60.8%の勝利率を達成した。

論文の概要: DocReward: A Document Reward Model for Structuring and Stylizing

関連論文リスト