Fugu-MT 論文翻訳(概要): ICG: Improving Cover Image Generation via MLLM-based Prompting and Personalized Preference Alignment

論文の概要: ICG: Improving Cover Image Generation via MLLM-based Prompting and Personalized Preference Alignment

arxiv url: http://arxiv.org/abs/2605.27374v1
Date: Wed, 08 Apr 2026 06:36:54 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-15 07:09:36.500228
Title: ICG: Improving Cover Image Generation via MLLM-based Prompting and Personalized Preference Alignment
Title（参考訳）: ICG:MLLMによるプロンプトとパーソナライズされた選好アライメントによるカバー画像生成の改善
Authors: Zhipeng Bian, Jieming Zhu, Qijiong Liu, Wang Lin, Guohao Cai, Zhaocheng Du, Jiacheng Sun, Zhou Zhao, Zhenhua Dong,
Abstract要約: 我々は、MLLMベースのプロンプトとパーソナライズされた嗜好アライメントを統合し、文脈に関連のあるカバーを生成するフレームワークであるICGを提案する。 ICGは、アイテムタイトルとメタトークンを介して参照画像からセマンティックな特徴を抽出し、ユーザ埋め込みによってそれらを洗練し、結果としてパーソナライズされたコンテキストを拡散モデルに注入する。実験により、ICGは画像の品質、セマンティックな忠実度、パーソナライゼーションを著しく改善し、ユーザの魅力とオフラインレコメンデーションの精度が向上することが示された。
参考スコア（独自算出の注目度）: 70.19758313256503
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advances in multimodal large language models (MLLMs) and diffusion models (DMs) have opened new possibilities for AI-generated content. Yet, personalized cover image generation remains underexplored, despite its critical role in boosting user engagement on digital platforms. We propose ICG, a novel framework that integrates MLLM-based prompting with personalized preference alignment to generate high-quality, contextually relevant covers. ICG extracts semantic features from item titles and reference images via meta tokens, refines them with user embeddings, and injects the resulting personalized context into the diffusion model. To address the lack of labeled supervision, we adopt a multi-reward learning strategy that combines public aesthetic and relevance rewards with a personalized preference model trained from user behavior. Unlike prior pipelines relying on handcrafted prompts and disjointed modules, ICG employs an adapter to bridge MLLMs and diffusion models for end-to-end training. Experiments demonstrate that ICG significantly improves image quality, semantic fidelity, and personalization, leading to stronger user appeal and offline recommendation accuracy in downstream tasks. As a plug-and-play adapter bridging MLLMs and diffusion models, ICG is compatible with common checkpoints and requires no ground-truth labels during optimization.
Abstract（参考訳）: マルチモーダル大言語モデル(MLLM)と拡散モデル(DM)の最近の進歩は、AI生成コンテンツに新たな可能性をもたらした。しかし、デジタルプラットフォーム上でユーザーエンゲージメントを高める上で重要な役割を担っているにもかかわらず、パーソナライズされたカバー画像生成はいまだ探索されていない。我々は、MLLMベースのプロンプトとパーソナライズされた嗜好アライメントを統合し、高品質で文脈に関連のあるカバーを生成する新しいフレームワークであるICGを提案する。 ICGは、アイテムタイトルとメタトークンを介して参照画像からセマンティックな特徴を抽出し、ユーザ埋め込みによってそれらを洗練し、結果としてパーソナライズされたコンテキストを拡散モデルに注入する。ラベル付き指導の欠如に対処するため,ユーザ行動から学習したパーソナライズされた嗜好モデルと公衆の美意識と関連性報酬を組み合わせたマルチリワード学習戦略を採用した。手作りのプロンプトと解離モジュールに依存する以前のパイプラインとは異なり、ICGはMLLMと拡散モデルの橋渡しにアダプタを使用している。実験により、ICGは画像の品質、セマンティックな忠実度、パーソナライゼーションを著しく改善し、ダウンストリームタスクにおけるユーザの魅力とオフラインレコメンデーションの精度が向上することが示された。 MLLMと拡散モデルをブリッジするプラグ・アンド・プレイアダプタとして、ICGは共通チェックポイントと互換性があり、最適化中にゼロトラストラベルを必要としない。

論文の概要: ICG: Improving Cover Image Generation via MLLM-based Prompting and Personalized Preference Alignment

関連論文リスト