Fugu-MT 論文翻訳(概要): ZIPP:Zero-shot Image Personalization from Personas

論文の概要: ZIPP:Zero-shot Image Personalization from Personas

arxiv url: http://arxiv.org/abs/2606.08841v1
Date: Sun, 07 Jun 2026 21:11:47 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-09 14:42:06.478454
Title: ZIPP:Zero-shot Image Personalization from Personas
Title（参考訳）: ZIPP:ペルソナからのゼロショット画像パーソナライゼーション
Authors: Harini SI, Somesh Singh, Yaman Kumar Singla, David Doermann, Rajiv Ratn Shah,
Abstract要約: 我々はペルソナ(ZipP)からゼロショット画像パーソナライズを導入する。 ZipPは、ユーザ固有のデータや重み更新を使わずに、自然言語のペルソナで画像を生成する。インダクティブグラフ注意ネットワークを2200万ユーザRedditインタラクショングラフ上でトレーニングし、大規模にペルソナをマイニングする。
参考スコア（独自算出の注目度）: 25.359254229320086
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Text-to-image diffusion models are increasingly deployed in open-ended creative contexts, yet their outputs remain impersonal, optimized for aggregate aesthetics rather than individual taste. Human preferences are pluralistic: one user favoring muted, nostalgic portraits may prefer vibrant street photography, while another gravitates toward dreamy film aesthetics. Existing methods require dense interaction histories or per-user fine-tuning, failing in cold-start settings and collapsing context-dependent preferences into a static representation. We introduce zero-shot image personalization from personas (ZIPP), which conditions image generation on natural-language personas (concise descriptors of a user's identity and aesthetic sensibilities) without any user-specific data or weight updates. ZIPP uses an LLM to rewrite prompts from the perspective of a given persona, steering diffusion models toward personalized outputs. To mine personas at scale, we train an inductive Graph Attention Network over a 22M-user Reddit interaction graph with dual contrastive objectives aligning graph structure with visual behavior, then verbalize learned representations into natural-language personas via an MLLM. We introduce ZIPBench, the first zero-shot personalization benchmark with 1.5K users, graph-mined personas, and 40K generated images. Across four benchmarks and 14 LLMs spanning five model families, persona conditioning yields consistent gains (13-20%), with frontier models benefiting most. In the few-shot setting, ZIPP matches or exceeds fine-tuned baselines trained on 100+ examples per user. ZIPP achieves the lowest preference distributional divergence (CMMD 0.16 vs. 0.55), and IPF-normalized demographic evaluation shows it substantially reduces subpopulation bias present in existing methods. Human evaluation confirms a 79% win rate over generic generation and 58-65% over all fine-tuned baselines.
Abstract（参考訳）: テキスト・ツー・イメージの拡散モデルは、オープンエンドの創造的な文脈でますます展開されているが、そのアウトプットは、個人の味ではなく、総合的な美学に最適化されている。人間の好みは多元的であり、あるユーザーはミュートを好み、懐古的な肖像画は活気のあるストリート写真を好むかもしれない。既存の手法では、密接なインタラクション履歴やユーザ毎の微調整が必要で、コールドスタート設定に失敗し、コンテキスト依存の好みを静的表現に分解する。本研究では、ユーザ固有のデータや重み更新を伴わずに、自然言語のペルソナ(ユーザのアイデンティティと審美感の簡潔な記述子)に画像生成を条件とした、パーソナからのゼロショット画像パーソナライゼーション(ZIPP)を導入する。 ZIPPはLLMを使用して、与えられたペルソナの観点からプロンプトを書き直し、パーソナライズされた出力に向けて拡散モデルを操る。大規模にペルソナをマイニングするために,2200万ユーザからなるRedditインタラクショングラフ上に,グラフ構造と視覚行動の整合性を両立させたインダクティブグラフアテンションネットワークをトレーニングし,学習した表現をMLLMを介して自然言語ペルソナに言語化する。 ZIPBenchは、1.5Kユーザ、グラフマイニングされたペルソナ、40K生成イメージを備えた、最初のゼロショットパーソナライズベンチマークである。 4つのベンチマークと5つのモデルファミリにまたがる14のLLMで、ペルソナ条件付けが一貫した利得(13-20%)を達成し、フロンティアモデルが最も恩恵を受ける。数ショット設定では、ZIPPはユーザ当たり100以上の例でトレーニングされた微調整されたベースラインにマッチするか、超過します。 ZIPPは、最も低い選好分布分散(CMMD 0.16 vs. 0.55)を実現し、IPF正規化人口統計学的評価は、既存の方法におけるサブポピュレーションバイアスを著しく減少させることを示している。人間の評価では、ジェネリックジェネレーションよりも79%、微調整されたベースラインよりも58-65%の勝利率が確認されている。

論文の概要: ZIPP:Zero-shot Image Personalization from Personas

関連論文リスト