Fugu-MT 論文翻訳(概要): RLPO: Residual Listwise Preference Optimization for Long-Context Review Ranking

論文の概要: RLPO: Residual Listwise Preference Optimization for Long-Context Review Ranking

arxiv url: http://arxiv.org/abs/2601.07449v1
Date: Mon, 12 Jan 2026 11:45:19 GMT
ステータス: 翻訳完了
システム内更新日: 2026-01-13 19:08:01.365637
Title: RLPO: Residual Listwise Preference Optimization for Long-Context Review Ranking
Title（参考訳）: RLPO:Long-Context Review Rankingのための残留リスト参照最適化
Authors: Hao Jiang, Zhi Yang, Annan Wang, Yichi Zhang, Weisi Lin,
Abstract要約: ポイントワイドスコアは効率的だが、しばしばリストレベルの相互作用を考慮に入れない。リストワイズアプローチはグローバルなコンテキストを活用することができるが、計算コストが高く、候補リストが大きくなるにつれて不安定になる。本稿では,リストワイズ表現レベルの残差補正としてランク付けするResidual Listwise Preference Optimization (RLPO)を提案する。
参考スコア（独自算出の注目度）: 50.709454968853954
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Review ranking is pivotal in e-commerce for prioritizing diagnostic and authentic feedback from the deluge of user-generated content. While large language models have improved semantic assessment, existing ranking paradigms face a persistent trade-off in long-context settings. Pointwise scoring is efficient but often fails to account for list-level interactions, leading to miscalibrated top-$k$ rankings. Listwise approaches can leverage global context, yet they are computationally expensive and become unstable as candidate lists grow. To address this, we propose Residual Listwise Preference Optimization (RLPO), which formulates ranking as listwise representation-level residual correction over a strong pointwise LLM scorer. RLPO first produces calibrated pointwise scores and item representations, then applies a lightweight encoder over the representations to predict listwise score residuals, avoiding full token-level listwise processing. We also introduce a large-scale benchmark for long-context review ranking with human verification. Experiments show RLPO improves NDCG@k over strong pointwise and listwise baselines and remains robust as list length increases.
Abstract（参考訳）: レビューランキングは、ユーザー生成コンテンツの希薄化から診断と真のフィードバックを優先する電子商取引において重要なものである。大きな言語モデルはセマンティックアセスメントを改善したが、既存のランク付けパラダイムは、長いコンテキスト設定で永続的なトレードオフに直面している。ポイントワイドのスコアリングは効率的だが、しばしばリストレベルのインタラクションを考慮できない。リストワイズアプローチはグローバルなコンテキストを活用することができるが、計算コストが高く、候補リストが大きくなるにつれて不安定になる。そこで本稿では,LLMスコアラに対するリストワイド表現レベルの残差補正を定式化したResidual Listwise Preference Optimization (RLPO)を提案する。 RLPOはまずキャリブレーションされたポイントワイズスコアとアイテム表現を生成し、次に軽量エンコーダを適用してリストワイズスコア残差を予測し、トークンレベルのリストワイズ処理を完全に回避する。また,人間の検証による長期コンテキストレビューランキングの大規模ベンチマークも導入した。実験の結果、RLPOはNDCG@kを強い点とリストの基準線で改善し、リスト長が増加するにつれて頑健であることがわかった。

論文の概要: RLPO: Residual Listwise Preference Optimization for Long-Context Review Ranking

関連論文リスト