Fugu-MT 論文翻訳(概要): ESTextSpotter: Towards Better Scene Text Spotting with Explicit Synergy in Transformer

論文の概要: ESTextSpotter: Towards Better Scene Text Spotting with Explicit Synergy in Transformer

arxiv url: http://arxiv.org/abs/2308.10147v1
Date: Sun, 20 Aug 2023 03:22:23 GMT
ステータス: 翻訳完了
システム内更新日: 2023-08-22 17:28:05.143595
Title: ESTextSpotter: Towards Better Scene Text Spotting with Explicit Synergy in Transformer
Title（参考訳）: ESTextSpotter: トランスフォーマーの明示的なシナジーによるシーンテキストスポッティングの改善
Authors: Mingxin Huang, Jiaxin Zhang, Dezhi Peng, Hao Lu, Can Huang, Yuliang Liu, Xiang Bai, Lianwen Jin
Abstract要約: 明示的な構文に基づくテキストスポッティング変換フレームワーク(ESTextSpotter)を紹介する。本モデルは,1つのデコーダ内におけるテキスト検出と認識のための識別的,インタラクティブな特徴をモデル化することにより,明示的な相乗効果を実現する。実験結果から,本モデルが従来の最先端手法よりも有意に優れていたことが示唆された。
参考スコア（独自算出の注目度）: 88.61312640540902
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In recent years, end-to-end scene text spotting approaches are evolving to the Transformer-based framework. While previous studies have shown the crucial importance of the intrinsic synergy between text detection and recognition, recent advances in Transformer-based methods usually adopt an implicit synergy strategy with shared query, which can not fully realize the potential of these two interactive tasks. In this paper, we argue that the explicit synergy considering distinct characteristics of text detection and recognition can significantly improve the performance text spotting. To this end, we introduce a new model named Explicit Synergy-based Text Spotting Transformer framework (ESTextSpotter), which achieves explicit synergy by modeling discriminative and interactive features for text detection and recognition within a single decoder. Specifically, we decompose the conventional shared query into task-aware queries for text polygon and content, respectively. Through the decoder with the proposed vision-language communication module, the queries interact with each other in an explicit manner while preserving discriminative patterns of text detection and recognition, thus improving performance significantly. Additionally, we propose a task-aware query initialization scheme to ensure stable training. Experimental results demonstrate that our model significantly outperforms previous state-of-the-art methods. Code is available at https://github.com/mxin262/ESTextSpotter.
Abstract（参考訳）: 近年、エンドツーエンドのシーンテキストスポッティングアプローチがTransformerベースのフレームワークに進化している。これまでの研究では、テキスト検出と認識の間に内在するシナジーの重要性が示されているが、トランスフォーマティブベースの手法の最近の進歩は、通常、共有クエリを伴う暗黙的なシナジー戦略を採用しており、これら2つの対話的タスクの可能性を完全には認識できない。本稿では,テキスト検出と認識の異なる特徴を考慮に入れた明示的な相乗効果が,パフォーマンステキストスポッティングを著しく向上させることを論じる。そこで本研究では,テキスト検出と認識のための識別的,インタラクティブな特徴を単一デコーダ内でモデル化することにより,明示的なシナジーを実現する,Explicit Synergy-based Text Spotting Transformer framework (ESTextSpotter) を提案する。具体的には、従来の共有クエリを、テキストポリゴンとコンテンツのタスク対応クエリに分解する。提案する視覚言語通信モジュールを用いたデコーダにより,テキスト検出と認識の識別パターンを保ちながら,クエリ同士を明示的な方法で対話し,性能を著しく向上させる。さらに,安定したトレーニングを実現するタスク対応クエリ初期化スキームを提案する。実験の結果,本モデルが従来の最先端手法を大きく上回ることがわかった。コードはhttps://github.com/mxin262/estextspotterで入手できる。

論文の概要: ESTextSpotter: Towards Better Scene Text Spotting with Explicit Synergy in Transformer

関連論文リスト