MOoSE: Multi-Orientation Sharing Experts for Open-set Scene Text Recognition
- URL: http://arxiv.org/abs/2407.18616v1
- Date: Fri, 26 Jul 2024 09:20:29 GMT
- Title: MOoSE: Multi-Orientation Sharing Experts for Open-set Scene Text Recognition
- Authors: Chang Liu, Simon Corbillé, Elisa H Barney Smith,
- Abstract summary: Open-set text recognition aims to address both novel characters and previously seen ones.
We first propose a Multi-Oriented Open-Set Text Recognition task (MOOSTR) to model the challenges of both novel characters and writing direction variety.
We then propose a Multi-Orientation Sharing Experts (MOoSE) framework as a strong baseline solution.
- Score: 3.6227230205444902
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Open-set text recognition, which aims to address both novel characters and previously seen ones, is one of the rising subtopics in the text recognition field. However, the current open-set text recognition solutions only focuses on horizontal text, which fail to model the real-life challenges posed by the variety of writing directions in real-world scene text. Multi-orientation text recognition, in general, faces challenges from the diverse image aspect ratios, significant imbalance in data amount, and domain gaps between orientations. In this work, we first propose a Multi-Oriented Open-Set Text Recognition task (MOOSTR) to model the challenges of both novel characters and writing direction variety. We then propose a Multi-Orientation Sharing Experts (MOoSE) framework as a strong baseline solution. MOoSE uses a mixture-of-experts scheme to alleviate the domain gaps between orientations, while exploiting common structural knowledge among experts to alleviate the data scarcity that some experts face. The proposed MOoSE framework is validated by ablative experiments, and also tested for feasibility on the existing open-set benchmark. Code, models, and documents are available at: https://github.com/lancercat/Moose/
Related papers
- Class-Aware Mask-Guided Feature Refinement for Scene Text Recognition [56.968108142307976]
We propose a novel approach called Class-Aware Mask-guided feature refinement (CAM)
Our approach introduces canonical class-aware glyph masks to suppress background and text style noise.
By enhancing the alignment between the canonical mask feature and the text feature, the module ensures more effective fusion.
arXiv Detail & Related papers (2024-02-21T09:22:45Z) - Orientation-Independent Chinese Text Recognition in Scene Images [61.34060587461462]
We take the first attempt to extract orientation-independent visual features by disentangling content and orientation information of text images.
Specifically, we introduce a Character Image Reconstruction Network (CIRN) to recover corresponding printed character images with disentangled content and orientation information.
arXiv Detail & Related papers (2023-09-03T05:30:21Z) - Text-guided Image Restoration and Semantic Enhancement for Text-to-Image Person Retrieval [11.798006331912056]
The goal of Text-to-Image Person Retrieval (TIPR) is to retrieve specific person images according to the given textual descriptions.
We propose a novel TIPR framework to build fine-grained interactions and alignment between person images and the corresponding texts.
arXiv Detail & Related papers (2023-07-18T08:23:46Z) - TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision [61.186488081379]
We propose TextFormer, a query-based end-to-end text spotter with Transformer architecture.
TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multi-task modeling.
It allows for mutual training and optimization of classification, segmentation, and recognition branches, resulting in deeper feature sharing.
arXiv Detail & Related papers (2023-06-06T03:37:41Z) - MAGE: Machine-generated Text Detection in the Wild [82.70561073277801]
Large language models (LLMs) have achieved human-level text generation, emphasizing the need for effective AI-generated text detection.
We build a comprehensive testbed by gathering texts from diverse human writings and texts generated by different LLMs.
Despite challenges, the top-performing detector can identify 86.54% out-of-domain texts generated by a new LLM, indicating the feasibility for application scenarios.
arXiv Detail & Related papers (2023-05-22T17:13:29Z) - OCRBench: On the Hidden Mystery of OCR in Large Multimodal Models [122.27878464009181]
We conducted a comprehensive evaluation of Large Multimodal Models, such as GPT4V and Gemini, in various text-related visual tasks.
OCRBench contains 29 datasets, making it the most comprehensive OCR evaluation benchmark available.
arXiv Detail & Related papers (2023-05-13T11:28:37Z) - Text is Text, No Matter What: Unifying Text Recognition using Knowledge
Distillation [41.43280922432707]
We argue for their unification -- we aim for a single model that can compete favourably with two separate state-of-the-art STR and HTR models.
We first show that cross-utilisation of STR and HTR models trigger significant performance drops due to differences in their inherent challenges.
We then tackle their union by introducing a knowledge distillation (KD) based framework.
arXiv Detail & Related papers (2021-07-26T10:10:34Z) - MOST: A Multi-Oriented Scene Text Detector with Localization Refinement [67.35280008722255]
We propose a new algorithm for scene text detection, which puts forward a set of strategies to significantly improve the quality of text localization.
Specifically, a Text Feature Alignment Module (TFAM) is proposed to dynamically adjust the receptive fields of features.
A Position-Aware Non-Maximum Suppression (PA-NMS) module is devised to exclude unreliable ones.
arXiv Detail & Related papers (2021-04-02T14:34:41Z) - Text Detection and Recognition in the Wild: A Review [7.43788469020627]
State-of-the-art scene text detection and/or recognition methods have exploited the advancement in deep learning architectures.
The paper presents a review on the recent advancement in scene text detection and recognition.
It also identifies several existing challenges for detecting or recognizing text in the wild images.
arXiv Detail & Related papers (2020-06-08T01:08:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.