Fugu-MT 論文翻訳(概要): Align Generative Artificial Intelligence with Human Preferences: A Novel Large Language Model Fine-Tuning Method for Online Review Management

論文の概要: Align Generative Artificial Intelligence with Human Preferences: A Novel Large Language Model Fine-Tuning Method for Online Review Management

arxiv url: http://arxiv.org/abs/2604.21209v1
Date: Thu, 23 Apr 2026 02:01:37 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-24 14:40:06.241633
Title: Align Generative Artificial Intelligence with Human Preferences: A Novel Large Language Model Fine-Tuning Method for Online Review Management
Title（参考訳）: 人間の嗜好を考慮した生産的人工知能:オンラインレビュー管理のための新しい大規模言語モデルファインチューニング手法
Authors: Yanan Wang, Yong Ge,
Abstract要約: オンラインレビューの急速な増加に対応するのに必要な相当な人的労働力のために、オンラインレビューの大部分が未完成のままである。生成AIは様々なタスクで顕著な成功を収めてきたが、汎用モデルであり、ドメイン固有の人間の好みとうまく一致しない可能性がある。オンラインレビュー応答を生成するために,LLMをドメイン固有の人的嗜好と整合させる新しい選好微調整法を提案する。
参考スコア（独自算出の注目度）: 8.484087427925632
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Online reviews have played a pivotal role in consumers' decision-making processes. Existing research has highlighted the significant impact of managerial review responses on customer relationship management and firm performance. However, a large portion of online reviews remains unaddressed due to the considerable human labor required to respond to the rapid growth of online reviews. While generative AI has achieved remarkable success in a range of tasks, they are general-purpose models and may not align well with domain-specific human preferences. To tailor these general generative AI models to domain-specific applications, finetuning is commonly employed. Nevertheless, several challenges persist in finetuning with domain-specific data, including hallucinations, difficulty in representing domain-specific human preferences, and over conservatism in offline policy optimization. To address these challenges, we propose a novel preference finetuning method to align an LLM with domain-specific human preferences for generating online review responses. Specifically, we first identify the source of hallucination and propose an effective context augmentation approach to mitigate the LLM hallucination. To represent human preferences, we propose a novel theory-driven preference finetuning approach that automatically constructs human preference pairs in the online review domain. Additionally, we propose a curriculum learning approach to further enhance preference finetuning. To overcome the challenge of over conservatism in existing offline preference finetuning method, we propose a novel density estimation-based support constraint method to relax the conservatism, and we mathematically prove its superior theoretical guarantees. Extensive evaluations substantiate the superiority of our proposed preference finetuning method.
Abstract（参考訳）: オンラインレビューは消費者の意思決定プロセスにおいて重要な役割を果たしてきた。既存の研究は、顧客関係管理と企業業績に対する管理者レビューの反応が大きな影響を浮き彫りにした。しかし、オンラインレビューの急速な増加に対応するのに必要な相当な人的労働力のために、オンラインレビューの大部分は未完成のままである。生成AIは様々なタスクで顕著な成功を収めてきたが、汎用モデルであり、ドメイン固有の人間の好みとうまく一致しない可能性がある。これらの一般的な生成AIモデルをドメイン固有のアプリケーションに合わせるために、ファインタニングが一般的に用いられる。それでも、幻覚、ドメイン固有の人間の嗜好を表現することの難しさ、オフラインポリシー最適化における保守性など、ドメイン固有のデータの微調整にはいくつかの課題が続いている。これらの課題に対処するために、オンラインレビュー応答を生成するために、LLMとドメイン固有の人間の嗜好を整合させる新しい選好微調整法を提案する。具体的には、まず幻覚の原因を特定し、LLM幻覚を緩和するための効果的な文脈拡張アプローチを提案する。人間の嗜好を表現するために,オンラインレビュードメインにおいて人選好ペアを自動的に構築する理論駆動の選好微調整手法を提案する。また、嗜好の微調整をさらに強化するカリキュラム学習手法を提案する。既存のオフライン選好微調整法において、保守主義を過度に克服するために、保守主義を緩和する新しい密度推定に基づくサポート制約法を提案し、その優れた理論的保証を数学的に証明する。大規模評価は,提案した選好微調整法の優越性を裏付けるものである。

論文の概要: Align Generative Artificial Intelligence with Human Preferences: A Novel Large Language Model Fine-Tuning Method for Online Review Management

関連論文リスト