Fugu-MT 論文翻訳(概要): Automatic Text Box Placement for Supporting Typographic Design

論文の概要: Automatic Text Box Placement for Supporting Typographic Design

arxiv url: http://arxiv.org/abs/2510.07665v1
Date: Thu, 09 Oct 2025 01:38:21 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-10 17:54:14.806456
Title: Automatic Text Box Placement for Supporting Typographic Design
Title（参考訳）: タイポグラフィー設計支援のためのテキストボックスの自動配置
Authors: Jun Muraoka, Daichi Haraguchi, Naoto Inoue, Wataru Shimoda, Kota Yamaguchi, Seiichi Uchida,
Abstract要約: 本研究では,不完全レイアウトにおけるテキストボックスの自動配置について検討する。標準的なTransformerベースの手法、小さなVision and Language Model(Phi3.5-vision)、大きな事前訓練されたVLM(Gemini)、複数の画像を処理する拡張Transformerを比較する。
参考スコア（独自算出の注目度）: 16.188785665663755
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In layout design for advertisements and web pages, balancing visual appeal and communication efficiency is crucial. This study examines automated text box placement in incomplete layouts, comparing a standard Transformer-based method, a small Vision and Language Model (Phi3.5-vision), a large pretrained VLM (Gemini), and an extended Transformer that processes multiple images. Evaluations on the Crello dataset show the standard Transformer-based models generally outperform VLM-based approaches, particularly when incorporating richer appearance information. However, all methods face challenges with very small text or densely populated layouts. These findings highlight the benefits of task-specific architectures and suggest avenues for further improvement in automated layout design.
Abstract（参考訳）: 広告やWebページのレイアウト設計においては、視覚的魅力とコミュニケーション効率のバランスが不可欠である。本研究では,標準トランスフォーマ方式,小型ビジョン・アンド・ランゲージ・モデル(Phi3.5-vision),大規模事前学習型VLM(Gemini),複数画像を処理する拡張トランスフォーマを比較検討した。 Crelloデータセットの評価では、標準のTransformerベースのモデルは一般的にVLMベースのアプローチよりも優れている。しかし、すべての手法は、非常に小さなテキストや密集したレイアウトで困難に直面している。これらの知見は、タスク固有のアーキテクチャの利点を強調し、自動レイアウト設計のさらなる改善の道筋を示唆している。

論文の概要: Automatic Text Box Placement for Supporting Typographic Design

関連論文リスト