Fugu-MT 論文翻訳(概要): DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting

論文の概要: DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting

arxiv url: http://arxiv.org/abs/2211.10772v2
Date: Wed, 23 Nov 2022 07:36:17 GMT
ステータス: 翻訳完了
システム内更新日: 2022-11-24 13:20:36.413904
Title: DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting
Title（参考訳）: DeepSolo: テキストスポッティングのための明示的なポイントソロ付きトランスフォーマーデコーダ
Authors: Maoyuan Ye, Jing Zhang, Shanshan Zhao, Juhua Liu, Tongliang Liu, Bo Du, Dacheng Tao
Abstract要約: DeepSoloは単純な検出トランスフォーマーのベースラインで、テキスト検出と認識を同時に行うためのExplicit Points Soloを備えた1つのデコーダを提供する。我々は、より正確な監視信号を提供するためにテキストマッチング基準を導入し、より効率的な訓練を可能にした。
参考スコア（独自算出の注目度）: 129.73247700864385
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: End-to-end text spotting aims to integrate scene text detection and recognition into a unified framework. Dealing with the relationship between the two sub-tasks plays a pivotal role in designing effective spotters. Although transformer-based methods eliminate the heuristic post-processing, they still suffer from the synergy issue between the sub-tasks and low training efficiency. In this paper, we present DeepSolo, a simple detection transformer baseline that lets a single Decoder with Explicit Points Solo for text detection and recognition simultaneously. Technically, for each text instance, we represent the character sequence as ordered points and model them with learnable explicit point queries. After passing a single decoder, the point queries have encoded requisite text semantics and locations and thus can be further decoded to the center line, boundary, script, and confidence of text via very simple prediction heads in parallel, solving the sub-tasks in text spotting in a unified framework. Besides, we also introduce a text-matching criterion to deliver more accurate supervisory signals, thus enabling more efficient training. Quantitative experiments on public benchmarks demonstrate that DeepSolo outperforms previous state-of-the-art methods and achieves better training efficiency. In addition, DeepSolo is also compatible with line annotations, which require much less annotation cost than polygons. The code will be released.
Abstract（参考訳）: エンドツーエンドテキストスポッティングは、シーンテキストの検出と認識を統一されたフレームワークに統合することを目的としている。 2つのサブタスク間の関係を扱うことは、効果的なスポッターを設計する上で重要な役割を果たす。トランスフォーマーベースの手法ではヒューリスティックなポストプロセッシングは排除されるが、サブタスクと低トレーニング効率の相乗効果の問題に苦しむ。本稿では,テキスト検出と認識を同時に行うために,Explicit Points Soloを持つ1つのデコーダを同時に使用可能な,単純な検出トランスフォーマベースラインであるDeepSoloを提案する。技術的には、各テキストインスタンスでは、文字列を順序付けポイントとして表現し、学習可能な明示的なポイントクエリでモデル化します。単一のデコーダを渡すと、ポイントクエリは必要なテキストセマンティクスと場所を符号化するので、非常に単純な予測ヘッドを通じてテキストの中央線、境界線、スクリプト、信頼性にさらにデコードでき、統一されたフレームワークでテキストスポッティングのサブタスクを解決できる。さらに,より正確な監視信号を提供するためのテキストマッチング基準を導入し,より効率的なトレーニングを可能にした。公開ベンチマークの定量的実験によると、DeepSoloは従来の最先端の手法より優れ、訓練効率が向上している。さらに、deepsoloは行アノテーションとも互換性があり、ポリゴンよりもはるかに少ないアノテーションコストを必要とする。コードはリリースされます。

論文の概要: DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting

関連論文リスト