Fugu-MT 論文翻訳(概要): TextSculptor: Training and Benchmarking Scene Text Editing

論文の概要: TextSculptor: Training and Benchmarking Scene Text Editing

arxiv url: http://arxiv.org/abs/2605.21090v1
Date: Wed, 20 May 2026 12:22:26 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-21 19:19:56.664117
Title: TextSculptor: Training and Benchmarking Scene Text Editing
Title（参考訳）: TextSculptor: テキスト編集のトレーニングとベンチマーク
Authors: Yiheng Lin, Siyu Jiao, Xiaohan Lan, Wei Zhou, Qi She, Fei Yu, Heyun Chen, Zhengwei Wang, Jinghuan Chen, Moran Li, Yingchen Yu, Zijian Feng, Yao Zhao, Yunchao Wei, Yujie Zhong,
Abstract要約: データ構築とシーンテキスト編集評価のための総合的なフレームワークであるTextSculptorを提案する。 TextSculptorはオープンソースのテキスト編集性能を改善し、プロプライエタリなモデルとのギャップを狭める。
参考スコア（独自算出の注目度）: 88.11688559021628
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advances in Multimodal Large Language Models (MLLMs) and diffusion-based generative models have substantially improved prompt-driven image editing. However, scene text editing remains challenging, as it requires models to precisely modify textual content while preserving visual realism and non-target regions. Current open-source models still lag behind proprietary systems, largely due to the scarcity of high-quality training data and the lack of standardized benchmarks tailored to text editing. To address these challenges, we present TextSculptor, a comprehensive framework for data construction and evaluation of scene text editing. We first develop an automated data construction pipeline that combines text-aware image synthesis with programmatic text rendering and compositing. Based on this pipeline, we build TextSculpt-Data, a large-scale dataset containing 3.2M training samples, including 1.2M OCR-verified text-to-image samples and 2M paired text editing samples with naturally aligned source-target images and strong background consistency. We further introduce TextSculpt-Bench, a benchmark covering four fundamental text editing tasks: text addition, text replacement, text removal, and hybrid editing. To support reliable evaluation, we design a tailored protocol that measures text accuracy, visual quality, and background preservation through OCR-based text alignment, multimodal judgment, and background-region similarity. Extensive experiments show that TextSculptor improves open-source text editing performance and narrows the gap to proprietary models. The data and benchmark are available at https://github.com/linyiheng123/TextSculptor.
Abstract（参考訳）: MLLM(Multimodal Large Language Models)と拡散ベース生成モデルの最近の進歩は、プロンプト駆動画像編集を大幅に改善している。しかし、シーンテキスト編集は、視覚リアリズムと非ターゲット領域を維持しながら、テキストコンテンツを正確に修正するモデルを必要とするため、依然として困難である。現在のオープンソースモデルは、高品質なトレーニングデータの不足と、テキスト編集に適した標準ベンチマークの欠如により、プロプライエタリなシステムに遅れを取っている。これらの課題に対処するために、シーンテキスト編集のデータ構築と評価のための総合的なフレームワークであるTextSculptorを提案する。まず、テキスト認識画像合成とプログラムによるテキストレンダリングと合成を組み合わせた自動データ構築パイプラインを開発する。このパイプラインに基づいてTextSculpt-Dataを構築した。これは3.2Mのトレーニングサンプルを含む大規模なデータセットで、1.2M OCRで検証されたテキスト・ツー・イメージのサンプルと、2Mペアのテキスト編集サンプルを含む。さらに、テキスト追加、テキスト置換、テキスト削除、ハイブリッド編集の4つの基本的なテキスト編集タスクをカバーするベンチマークであるTextSculpt-Benchを導入する。信頼性評価を支援するために,OCRベースのテキストアライメント,マルチモーダル判断,背景領域の類似性を通じて,テキストの精度,視覚的品質,背景保存を計測する調整プロトコルを設計する。大規模な実験によると、TextSculptorはオープンソースのテキスト編集性能を改善し、プロプライエタリなモデルとのギャップを狭める。データとベンチマークはhttps://github.com/linyiheng123/TextSculptor.comで公開されている。

論文の概要: TextSculptor: Training and Benchmarking Scene Text Editing

関連論文リスト