Fugu-MT 論文翻訳(概要): LRANet++: Low-Rank Approximation Network for Accurate and Efficient Text Spotting

論文の概要: LRANet++: Low-Rank Approximation Network for Accurate and Efficient Text Spotting

arxiv url: http://arxiv.org/abs/2511.05818v1
Date: Sat, 08 Nov 2025 03:08:03 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-11 21:18:44.59496
Title: LRANet++: Low-Rank Approximation Network for Accurate and Efficient Text Spotting
Title（参考訳）: LRANet++: 高精度かつ効率的なテキストスポッティングのための低ランク近似ネットワーク
Authors: Yuchen Su, Zhineng Chen, Yongkun Du, Zuxuan Wu, Hongtao Xie, Yu-Gang Jiang,
Abstract要約: 高精度検出のための低ランク近似に基づく新しいパラメータ化テキスト形状法を提案する。異なるテキストの輪郭間の固有形状相関を利用して、形状表現の一貫性とコンパクト性を実現する。我々は、LRANet++と呼ばれるエンドツーエンドテキストスポッティングフレームワークを構築するために、拡張検出モジュールを軽量な認識ブランチに統合する。
参考スコア（独自算出の注目度）: 118.93173826110815
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: End-to-end text spotting aims to jointly optimize text detection and recognition within a unified framework. Despite significant progress, designing an accurate and efficient end-to-end text spotter for arbitrary-shaped text remains largely unsolved. We identify the primary bottleneck as the lack of a reliable and efficient text detection method. To address this, we propose a novel parameterized text shape method based on low-rank approximation for precise detection and a triple assignment detection head to enable fast inference. Specifically, unlike other shape representation methods that employ data-irrelevant parameterization, our data-driven approach derives a low-rank subspace directly from labeled text boundaries. To ensure this process is robust against the inherent annotation noise in this data, we utilize a specialized recovery method based on an $\ell_1$-norm formulation, which accurately reconstructs the text shape with only a few key orthogonal vectors. By exploiting the inherent shape correlation among different text contours, our method achieves consistency and compactness in shape representation. Next, the triple assignment scheme introduces a novel architecture where a deep sparse branch (for stabilized training) is used to guide the learning of an ultra-lightweight sparse branch (for accelerated inference), while a dense branch provides rich parallel supervision. Building upon these advancements, we integrate the enhanced detection module with a lightweight recognition branch to form an end-to-end text spotting framework, termed LRANet++, capable of accurately and efficiently spotting arbitrary-shaped text. Extensive experiments on several challenging benchmarks demonstrate the superiority of LRANet++ compared to state-of-the-art methods. Code will be available at: https://github.com/ychensu/LRANet-PP.git
Abstract（参考訳）: エンドツーエンドのテキストスポッティングは、統一されたフレームワーク内でのテキストの検出と認識を共同で最適化することを目的としている。大幅な進歩にもかかわらず、任意の形のテキストに対する正確で効率的なエンドツーエンドのテキストスポッターを設計することは、ほとんど未解決のままである。主要なボトルネックは、信頼性と効率的なテキスト検出方法が欠如していることである。そこで本研究では,高精度検出のための低ランク近似に基づく新しいパラメータ化テキスト形状法と,高速な推論を可能にする3重代入検出ヘッドを提案する。具体的には、データ非関連パラメータ化を用いた他の形状表現法とは異なり、データ駆動方式はラベル付きテキスト境界から直接低ランクな部分空間を導出する。この処理が本データ中の固有アノテーションノイズに対して堅牢であることを保証するために,$\ell_1$-norm の定式化に基づく特殊回復法を用いて,数個のキー直交ベクトルでテキストの形状を正確に再構成する。異なるテキストの輪郭間の固有形状相関を利用して、形状表現の一貫性とコンパクト性を実現する。次に、三重代入方式は、深度スパース分岐(安定化訓練のための)を用いて超軽量スパース分岐(加速推論のための)の学習を誘導する新しいアーキテクチャを導入し、高密度分岐は豊富な並列監視を提供する。これらの進歩に基づいて、拡張検出モジュールを軽量な認識ブランチに統合し、任意の形のテキストを正確にかつ効率的に発見できるLRANet++と呼ばれるエンドツーエンドテキストスポッティングフレームワークを構築します。いくつかの挑戦的なベンチマークに関する大規模な実験は、最先端の手法と比較してLRANet++の優位性を示している。コードは、https://github.com/ychensu/LRANet-PP.gitで入手できる。

論文の概要: LRANet++: Low-Rank Approximation Network for Accurate and Efficient Text Spotting

関連論文リスト