Related papers: TextMamba: Scene Text Detector with Mamba

TextMamba: Scene Text Detector with Mamba

URL: http://arxiv.org/abs/2512.06657v1
Date: Sun, 07 Dec 2025 05:06:19 GMT
Title: TextMamba: Scene Text Detector with Mamba
Authors: Qiyan Zhao, Yue Yan, Da-Han Wang,
Abstract summary: We propose a novel scene text detector based on Mamba that integrates the selection mechanism with attention layers.<n>We adopt the Top_k algorithm to explicitly select key information and reduce the interference of irrelevant information in Mamba modeling.<n>Our method achieves state-of-the-art or competitive performance on various benchmarks.
Score: 6.992080935409672
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In scene text detection, Transformer-based methods have addressed the global feature extraction limitations inherent in traditional convolution neural network-based methods. However, most directly rely on native Transformer attention layers as encoders without evaluating their cross-domain limitations and inherent shortcomings: forgetting important information or focusing on irrelevant representations when modeling long-range dependencies for text detection. The recently proposed state space model Mamba has demonstrated better long-range dependencies modeling through a linear complexity selection mechanism. Therefore, we propose a novel scene text detector based on Mamba that integrates the selection mechanism with attention layers, enhancing the encoder's ability to extract relevant information from long sequences. We adopt the Top\_k algorithm to explicitly select key information and reduce the interference of irrelevant information in Mamba modeling. Additionally, we design a dual-scale feed-forward network and an embedding pyramid enhancement module to facilitate high-dimensional hidden state interactions and multi-scale feature fusion. Our method achieves state-of-the-art or competitive performance on various benchmarks, with F-measures of 89.7\%, 89.2\%, and 78.5\% on CTW1500, TotalText, and ICDAR19ArT, respectively. Codes will be available.

Related papers

Fore-Mamba3D: Mamba-based Foreground-Enhanced Encoding for 3D Object Detection [16.398581898787608]
We propose a novel backbone, Fore-Mamba3D, to focus on the foreground enhancement by modifying Mamba-based encoder.<n>Considering the response attenuation existing in the interaction of foreground voxels across different instances, we design a regional-to-global slide window.<n>Our method emphasizes foreground-only encoding and alleviates the distance-based and causal dependencies in the linear autore model.
arXiv Detail & Related papers (2026-02-23T06:03:07Z)
CSFMamba: Cross State Fusion Mamba Operator for Multimodal Remote Sensing Image Classification [12.959829835589453]
We propose Cross State Fusion Mamba (Camba) Network.<n>Specifically, we first design the preprocessing module of remote sensing image information for the needs of Mamba structure.<n> Secondly, a cross-state module based on Mamba operator is creatively designed to fully fuse the feature of the two modalities.
arXiv Detail & Related papers (2025-08-31T03:08:34Z)
Trajectory-aware Shifted State Space Models for Online Video Super-Resolution [57.87099307245989]
This paper presents a novel online VSR method based on Trajectory-aware Shifted SSMs (TS-Mamba)<n>TS-Mamba first constructs the trajectories within a video to select the most similar tokens from the previous frames.<n>Our TS-Mamba achieves state-of-the-art performance in most cases and over 22.7% reduction complexity (in MACs)
arXiv Detail & Related papers (2025-08-14T08:42:15Z)
AtrousMamaba: An Atrous-Window Scanning Visual State Space Model for Remote Sensing Change Detection [29.004019252136565]
We propose a novel model, AtrousMamba, which balances the extraction of fine-grained local details with the integration of global contextual information.<n>By leveraging the atrous window scan visual state space (AWVSS) module, we design dedicated end-to-end Mamba-based frameworks for binary change detection (BCD) and semantic change detection (SCD)<n> Experimental results on six benchmark datasets show that the proposed framework outperforms existing CNN-based, Transformer-based, and Mamba-based methods.
arXiv Detail & Related papers (2025-07-22T02:36:16Z)
GLADMamba: Unsupervised Graph-Level Anomaly Detection Powered by Selective State Space Model [4.4735289317146405]
GLADMamba is a novel framework that adapts the selective state space model into UGLAD field.<n>To the best of our knowledge, this is the first work to introduce Mamba and explicit spectral information to UGLAD.
arXiv Detail & Related papers (2025-03-23T02:40:17Z)
Mamba-SEUNet: Mamba UNet for Monaural Speech Enhancement [54.427965535613886]
Mamba, as a novel state-space model (SSM), has gained widespread application in natural language processing and computer vision.<n>In this work, we introduce Mamba-SEUNet, an innovative architecture that integrates Mamba with U-Net for SE tasks.
arXiv Detail & Related papers (2024-12-21T13:43:51Z)
MambaPlace:Text-to-Point-Cloud Cross-Modal Place Recognition with Attention Mamba Mechanisms [2.4775350526606355]
Vision Language Place Recognition (VLVPR) enhances robot localization performance by incorporating natural language descriptions from images.<n>By utilizing language information, VLVPR directs robot place matching, overcoming the constraint of solely depending on vision.<n>This paper proposes a novel coarse to fine and end to end connected cross modal place recognition framework, called MambaPlace.
arXiv Detail & Related papers (2024-08-28T12:06:11Z)
SIGMA: Selective Gated Mamba for Sequential Recommendation [56.85338055215429]
Mamba, a recent advancement, has exhibited exceptional performance in time series prediction.<n>We introduce a new framework named Selective Gated Mamba ( SIGMA) for Sequential Recommendation.<n>Our results indicate that SIGMA outperforms current models on five real-world datasets.
arXiv Detail & Related papers (2024-08-21T09:12:59Z)
LaMamba-Diff: Linear-Time High-Fidelity Diffusion Models Based on Local Attention and Mamba [54.85262314960038]
Local Attentional Mamba blocks capture both global contexts and local details with linear complexity. Our model exhibits exceptional scalability and surpasses the performance of DiT across various model scales on ImageNet at 256x256 resolution. Compared to state-of-the-art diffusion models on ImageNet 256x256 and 512x512, our largest model presents notable advantages, such as a reduction of up to 62% GFLOPs.
arXiv Detail & Related papers (2024-08-05T16:39:39Z)
DeciMamba: Exploring the Length Extrapolation Potential of Mamba [89.07242846058023]
We introduce DeciMamba, a context-extension method specifically designed for Mamba.<n>Experiments over real-world long-range NLP tasks show that DeciMamba can extrapolate to context lengths significantly longer than the ones seen during training.
arXiv Detail & Related papers (2024-06-20T17:40:18Z)
STMT: A Spatial-Temporal Mesh Transformer for MoCap-Based Action Recognition [50.064502884594376]
We study the problem of human action recognition using motion capture (MoCap) sequences. We propose a novel Spatial-Temporal Mesh Transformer (STMT) to directly model the mesh sequences. The proposed method achieves state-of-the-art performance compared to skeleton-based and point-cloud-based models.
arXiv Detail & Related papers (2023-03-31T16:19:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.