Fugu-MT 論文翻訳(概要): AesFormer: Transform Everyday Photos into Beautiful Memories

論文の概要: AesFormer: Transform Everyday Photos into Beautiful Memories

arxiv url: http://arxiv.org/abs/2605.22126v1
Date: Thu, 21 May 2026 08:00:49 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-22 20:14:18.533966
Title: AesFormer: Transform Everyday Photos into Beautiful Memories
Title（参考訳）: AesFormer:毎日の写真を美しい思い出に変える
Authors: Tianxiang Du, Hulingxiao He, Yuxin Peng,
Abstract要約: 我々は, 審美的写真再構成を, 構造的再構築による美的品質の向上として定式化する。 AesFormerは、画像編集から美的計画を切り離す2段階のフレームワークである。 AesFormerはAPRのパフォーマンスを大幅に改善する。
参考スコア（独自算出の注目度）: 47.103757942619914
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In everyday photography, aesthetically appealing moments are often captured with structural flaws (e.g., composition, camera viewpoint, or pose) that existing retouching and portrait enhancement methods cannot fix. We formulate Aesthetic Photo Reconstruction (APR) as improving a photo's aesthetic quality via structural reconstruction while preserving subject identity and scene semantics. Although recent advances in image editing models make APR feasible, they often lack aesthetic understanding, yielding edits that are semantically plausible yet aesthetically weak. To address this, we propose AesFormer, a two-stage framework that decouples aesthetic planning from image editing. In Stage 1, an aesthetic action model (AesThinker) analyzes the input along seven progressive photographic dimensions and outputs executable editing actions; we further apply GRPO-A to encourage broad exploration over diverse action plans beyond SFT. In Stage 2, an action-conditioned editor (AesEditor) performs structural edits guided by these actions. To support APR, we build a video-based corpus-mining pipeline (VCMP) and construct AesRecon, a benchmark of 9,071 strictly aligned (poor, good) image pairs. Experiments show that AesFormer substantially improves APR performance and is competitive with Nano Banana Pro. Code is available at https://github.com/PKU-ICST-MIPL/AesFormer_ICML2026.
Abstract（参考訳）: 日常的な写真では、既存のリタッチやポートレート・エンハンスメントの方法では修正できない構造的欠陥(例えば、構成、カメラの視点、ポーズ)で、審美的に魅力的な瞬間がしばしば撮影される。美的写真再構成 (APR) は, 被写体識別とシーンセマンティクスを保ちながら, 画像の美的品質を向上させるものとして定式化した。画像編集モデルの最近の進歩により、APRは実現可能であるが、しばしば審美的理解が欠如し、意味論的に妥当だが審美的に弱い編集が得られる。これを解決するために,画像編集から美的計画を切り離す2段階のフレームワークであるAesFormerを提案する。ステージ1では、7つのプログレッシブな写真次元に沿って入力を分析し、実行可能な編集動作を出力する。ステージ2では、アクション条件エディタ(AesEditor)がこれらのアクションによってガイドされる構造的な編集を実行する。 APRをサポートするために、ビデオベースのコーパスマイニングパイプライン(VCMP)を構築し、9,071のベンチマークであるAesReconを構築した。実験の結果、AesFormerはAPRの性能を大幅に改善し、Nano Banana Proと競合することがわかった。コードはhttps://github.com/PKU-ICST-MIPL/AesFormer_ICML2026で公開されている。

論文の概要: AesFormer: Transform Everyday Photos into Beautiful Memories

関連論文リスト