Fugu-MT 論文翻訳(概要): DNTextSpotter: Arbitrary-Shaped Scene Text Spotting via Improved Denoising Training

論文の概要: DNTextSpotter: Arbitrary-Shaped Scene Text Spotting via Improved Denoising Training

arxiv url: http://arxiv.org/abs/2408.00355v3
Date: Sun, 3 Nov 2024 14:33:34 GMT
ステータス: 翻訳完了
システム内更新日: 2024-11-08 13:29:21.957664
Title: DNTextSpotter: Arbitrary-Shaped Scene Text Spotting via Improved Denoising Training
Title（参考訳）: DNTextSpotter:Denoising Trainingの改良による任意形状のシーンテキストスポッティング
Authors: Yu Xie, Qian Qiao, Jun Gao, Tianxiang Wu, Jiaqing Fan, Yue Zhang, Jielei Zhang, Huyang Sun,
Abstract要約: 任意のテキストスポッティングのための新しいDenoising Training(DNTextSpotter)を提案する。 DNTextSpotterは、denoising部のクエリをノイズのある位置クエリとノイズのあるコンテンツクエリに分解する。 4つのベンチマークで最先端の手法よりも優れています。
参考スコア（独自算出の注目度）: 17.734265617973293
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: More and more end-to-end text spotting methods based on Transformer architecture have demonstrated superior performance. These methods utilize a bipartite graph matching algorithm to perform one-to-one optimal matching between predicted objects and actual objects. However, the instability of bipartite graph matching can lead to inconsistent optimization targets, thereby affecting the training performance of the model. Existing literature applies denoising training to solve the problem of bipartite graph matching instability in object detection tasks. Unfortunately, this denoising training method cannot be directly applied to text spotting tasks, as these tasks need to perform irregular shape detection tasks and more complex text recognition tasks than classification. To address this issue, we propose a novel denoising training method (DNTextSpotter) for arbitrary-shaped text spotting. Specifically, we decompose the queries of the denoising part into noised positional queries and noised content queries. We use the four Bezier control points of the Bezier center curve to generate the noised positional queries. For the noised content queries, considering that the output of the text in a fixed positional order is not conducive to aligning position with content, we employ a masked character sliding method to initialize noised content queries, thereby assisting in the alignment of text content and position. To improve the model's perception of the background, we further utilize an additional loss function for background characters classification in the denoising training part.Although DNTextSpotter is conceptually simple, it outperforms the state-of-the-art methods on four benchmarks (Total-Text, SCUT-CTW1500, ICDAR15, and Inverse-Text), especially yielding an improvement of 11.3% against the best approach in Inverse-Text dataset.
Abstract（参考訳）: Transformerアーキテクチャに基づくエンドツーエンドのテキストスポッティング手法は、より優れたパフォーマンスを示している。これらの手法は、2部グラフマッチングアルゴリズムを用いて予測対象と実際の対象との1対1の最適マッチングを行う。しかし、二部グラフマッチングの不安定性は、一貫性のない最適化目標につながる可能性があるため、モデルのトレーニング性能に影響を及ぼす。既存の文献では、オブジェクト検出タスクにおける二部グラフマッチングの不安定性の問題を解決するために、Denoising Trainingを適用している。残念ながら、これらのタスクは、分類よりも不規則な形状検出タスクやより複雑なテキスト認識タスクを実行する必要があるため、テキストスポッティングタスクに直接適用することはできない。そこで本研究では,任意のテキストスポッティングのための新しいDenoising Training Method (DNTextSpotter)を提案する。具体的には,ノイズのある部分の問合せを,ノイズのある位置の問合せとノイズのある内容問合せに分解する。我々は、ベジエ中心曲線の4つのベジエ制御点を用いて、ノイズのある位置クエリを生成する。ノイズコンテンツクエリでは,定位順のテキストの出力がコンテンツとの整合性に寄与しないことを考慮し,ノイズコンテンツクエリを初期化するマスク付き文字スライディング手法を用いて,テキストの内容と位置の整合性を支援する。 DNTextSpotterは概念的にはシンプルだが、4つのベンチマーク(Total-Text, SCUT-CTW1500, ICDAR15, Inverse-Text)で最先端の手法よりも優れており、特にInverse-Textデータセットのベストアプローチに対して11.3%向上している。

関連論文リスト

OFFSET: Segmentation-based Focus Shift Revision for Composed Image Retrieval [59.377821673653436]
Composed Image Retrieval (CIR)は、ユーザの複雑な検索要求を柔軟に表現することができる。 1) 視覚データにおける支配的部分とノイズ的部分の不均一性は無視され、クエリー特徴が劣化する。本研究は、主部分分割と二重焦点写像という2つのモジュールからなる集中写像に基づく特徴抽出器を提案する。
論文参考訳（メタデータ） (2025-07-08T03:27:46Z)
Learning Robust Named Entity Recognizers From Noisy Data With Retrieval Augmentation [67.89838237013078]
名前付きエンティティ認識(NER)モデルは、しばしばノイズの多い入力に悩まされる。ノイズの多いテキストとそのNERラベルのみを利用できる、より現実的な設定を提案する。我々は、推論中にテキストを取得することなく、堅牢なNERを改善するマルチビュートレーニングフレームワークを採用している。
論文参考訳（メタデータ） (2024-07-26T07:30:41Z)
LOGO: Video Text Spotting with Language Collaboration and Glyph Perception Model [20.007650672107566]
ビデオテキストスポッティング(VTS)は、ビデオ内のテキストインスタンスを同時にローカライズ、認識、追跡することを目的としている。最近の方法では、最先端の画像テキストスポッターのゼロショット結果を直接追跡する。特定のデータセット上の微調整トランスフォーマーベースのテキストスポッターにより、パフォーマンスが向上する可能性がある。
論文参考訳（メタデータ） (2024-05-29T15:35:09Z)
Efficiently Leveraging Linguistic Priors for Scene Text Spotting [63.22351047545888]
本稿では,大規模テキストコーパスから言語知識を活用する手法を提案する。シーンテキストデータセットとよく一致したテキスト分布を生成し、ドメイン内の微調整の必要性を取り除く。実験結果から,本手法は認識精度を向上するだけでなく,単語のより正確な局所化を可能にすることが示された。
論文参考訳（メタデータ） (2024-02-27T01:57:09Z)
TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision [61.186488081379]
Transformerアーキテクチャを用いた問合せベースのエンドツーエンドテキストスポッターであるTextFormerを提案する。 TextFormerは、画像エンコーダとテキストデコーダの上に構築され、マルチタスクモデリングのための共同セマンティック理解を学ぶ。分類、セグメンテーション、認識のブランチの相互訓練と最適化を可能にし、より深い特徴共有をもたらす。
論文参考訳（メタデータ） (2023-06-06T03:37:41Z)
Text Revision by On-the-Fly Representation Optimization [76.11035270753757]
現在の最先端手法は、これらのタスクをシーケンスからシーケンスまでの学習問題として定式化している。並列データを必要としないテキストリビジョンのための反復的なインプレース編集手法を提案する。テキストの単純化に関する最先端の教師付き手法よりも、競争力があり、パフォーマンスも向上する。
論文参考訳（メタデータ） (2022-04-15T07:38:08Z)
Language Matters: A Weakly Supervised Pre-training Approach for Scene Text Detection and Spotting [69.77701325270047]
本稿では,シーンテキストを効果的に表現できる弱教師付き事前学習手法を提案する。本ネットワークは,画像エンコーダと文字認識型テキストエンコーダから構成され,視覚的特徴とテキスト的特徴を抽出する。実験により、事前訓練されたモデルは、重みを他のテキスト検出やスポッティングネットワークに転送しながら、Fスコアを+2.5%、+4.8%改善することが示された。
論文参考訳（メタデータ） (2022-03-08T08:10:45Z)
Towards Weakly-Supervised Text Spotting using a Multi-Task Transformer [21.479222207347238]
テキストスポッティングのための変換器ベースのアプローチであるTextTranSpotter(TTS)を紹介する。 TTSは、完全に管理された設定と弱い設定の両方で訓練される。 TextTranSpotterは、完全に教師された方法でトレーニングされ、複数のベンチマークで最先端の結果を表示する。
論文参考訳（メタデータ） (2022-02-11T08:50:09Z)
ABCNet v2: Adaptive Bezier-Curve Network for Real-time End-to-end Text Spotting [108.93803186429017]
エンドツーエンドのテキストスポッティングは、統一されたフレームワークで検出と認識を統合することを目指している。本稿では、Adaptive Bezier Curve Network v2 (ABCNet v2) を提示することで、エンドツーエンドテキストスポッティングに取り組む。 1) 任意の形状のテキストをパラメータ化されたベジアー曲線で適応的に適合させ, セグメンテーション法と比較すると, 構造的な出力だけでなく, 制御可能な表現も提供できる。様々なバイリンガル(英語と中国語)ベンチマークデータセットで実施された総合的な実験は、ABCNet v2が現状を達成することを実証している。
論文参考訳（メタデータ） (2021-05-08T07:46:55Z)
Text Recognition -- Real World Data and Where to Find Them [36.10220484561196]
本稿では,弱い注釈付き画像を利用してテキスト抽出パイプラインを改善する手法を提案する。このアプローチでは、任意のエンドツーエンドのテキスト認識システムを使用して、テキスト領域の提案と、おそらく誤った書き起こしを取得する。シーンテキストのほとんどエラーのないローカライズされたインスタンスを生成し、これが"擬似基底真理"(PGT)として扱う。
論文参考訳（メタデータ） (2020-07-06T22:23:27Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。