Fugu-MT 論文翻訳(概要): GoMatching++: Parameter- and Data-Efficient Arbitrary-Shaped Video Text Spotting and Benchmarking

論文の概要: GoMatching++: Parameter- and Data-Efficient Arbitrary-Shaped Video Text Spotting and Benchmarking

arxiv url: http://arxiv.org/abs/2505.22228v1
Date: Wed, 28 May 2025 11:02:45 GMT
ステータス: 翻訳完了
システム内更新日: 2025-05-29 17:35:50.561791
Title: GoMatching++: Parameter- and Data-Efficient Arbitrary-Shaped Video Text Spotting and Benchmarking
Title（参考訳）: GoMatching++:パラメータとデータ効率のよい任意型ビデオテキストスポッティングとベンチマーク
Authors: Haibin He, Jing Zhang, Maoyuan Ye, Juhua Liu, Bo Du, Dacheng Tao,
Abstract要約: ビデオテキストスポッティング(VTS)は、テキストトラッキングを追加することで、画像テキストスポッティング(ITS)を拡張する。 VTSの進歩にもかかわらず、既存のメソッドはまだITSで見られるパフォーマンスに劣っている。 GoMatching++は、既製の画像テキストスポッターをビデオスペシャリストに変換する。
参考スコア（独自算出の注目度）: 77.0306273129475
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Video text spotting (VTS) extends image text spotting (ITS) by adding text tracking, significantly increasing task complexity. Despite progress in VTS, existing methods still fall short of the performance seen in ITS. This paper identifies a key limitation in current video text spotters: limited recognition capability, even after extensive end-to-end training. To address this, we propose GoMatching++, a parameter- and data-efficient method that transforms an off-the-shelf image text spotter into a video specialist. The core idea lies in freezing the image text spotter and introducing a lightweight, trainable tracker, which can be optimized efficiently with minimal training data. Our approach includes two key components: (1) a rescoring mechanism to bridge the domain gap between image and video data, and (2) the LST-Matcher, which enhances the frozen image text spotter's ability to handle video text. We explore various architectures for LST-Matcher to ensure efficiency in both parameters and training data. As a result, GoMatching++ sets new performance records on challenging benchmarks such as ICDAR15-video, DSText, and BOVText, while significantly reducing training costs. To address the lack of curved text datasets in VTS, we introduce ArTVideo, a new benchmark featuring over 30% curved text with detailed annotations. We also provide a comprehensive statistical analysis and experimental results for ArTVideo. We believe that GoMatching++ and the ArTVideo benchmark will drive future advancements in video text spotting. The source code, models and dataset are publicly available at https://github.com/Hxyz-123/GoMatching.
Abstract（参考訳）: ビデオテキストスポッティング(VTS)は、テキスト追跡を追加して画像テキストスポッティング(ITS)を拡張し、タスクの複雑さを大幅に増加させる。 VTSの進歩にもかかわらず、既存のメソッドはまだITSで見られるパフォーマンスに劣っている。本稿では,従来のビデオテキストスポッターにおいて,広範囲なエンドツーエンドトレーニング後の認識能力の制限という重要な制限について述べる。そこで本研究では,オフザシェルフ画像テキストスポッタをビデオスペシャリストに変換するパラメータとデータ効率の手法であるGoMatching++を提案する。中心となるアイデアは、画像テキストスポッターの凍結と、最小限のトレーニングデータで効率的に最適化できる軽量でトレーニング可能なトラッカーの導入である。本手法は,(1)画像と映像データの領域ギャップを埋めるリスコリング機構,(2)凍結画像テキストスポッターの動画テキスト処理能力を高めるLST-Matcherの2つの重要な構成要素を含む。パラメータとトレーニングデータの効率性を確保するため,LST-Matcherのアーキテクチャについて検討する。その結果、GoMatching++は、ICDAR15-video、DSText、BOVTextといった挑戦的なベンチマークに新たなパフォーマンスレコードを設定し、トレーニングコストを大幅に削減した。 VTSにおける曲面テキストデータセットの欠如に対処するため、詳細なアノテーションを備えた30%以上の曲面テキストを特徴とする新しいベンチマークであるArTVideoを紹介した。また、ArTVideoの総合的な統計分析と実験結果も提供する。 GoMatching++とArTVideoベンチマークは、将来のビデオテキストスポッティングの進歩を促進するだろうと考えています。ソースコード、モデル、データセットはhttps://github.com/Hxyz-123/GoMatching.comで公開されている。

論文の概要: GoMatching++: Parameter- and Data-Efficient Arbitrary-Shaped Video Text Spotting and Benchmarking

関連論文リスト