Diving into the Depths of Spotting Text in Multi-Domain Noisy Scenes
- URL: http://arxiv.org/abs/2310.00558v3
- Date: Sat, 17 Feb 2024 14:10:25 GMT
- Title: Diving into the Depths of Spotting Text in Multi-Domain Noisy Scenes
- Authors: Alloy Das, Sanket Biswas, Umapada Pal and Josep Llad\'os
- Abstract summary: We present a text spotting validation benchmark called Under-Water Text (UWT) for noisy underwater scenes.
We also design an efficient super-resolution based end-to-end transformer baseline called DA-TextSpotter.
The dataset, code and pre-trained models will be released upon acceptance.
- Score: 11.478236584340255
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: When used in a real-world noisy environment, the capacity to generalize to
multiple domains is essential for any autonomous scene text spotting system.
However, existing state-of-the-art methods employ pretraining and fine-tuning
strategies on natural scene datasets, which do not exploit the feature
interaction across other complex domains. In this work, we explore and
investigate the problem of domain-agnostic scene text spotting, i.e., training
a model on multi-domain source data such that it can directly generalize to
target domains rather than being specialized for a specific domain or scenario.
In this regard, we present the community a text spotting validation benchmark
called Under-Water Text (UWT) for noisy underwater scenes to establish an
important case study. Moreover, we also design an efficient super-resolution
based end-to-end transformer baseline called DA-TextSpotter which achieves
comparable or superior performance over existing text spotting architectures
for both regular and arbitrary-shaped scene text spotting benchmarks in terms
of both accuracy and model efficiency. The dataset, code and pre-trained models
will be released upon acceptance.
Related papers
- A Unified Data Augmentation Framework for Low-Resource Multi-Domain Dialogue Generation [52.0964459842176]
Current state-of-the-art dialogue systems heavily rely on extensive training datasets.
We propose a novel data textbfAugmentation framework for textbfMulti-textbfDomain textbfDialogue textbfGeneration, referred to as textbfAMD$2$G.
The AMD$2$G framework consists of a data augmentation process and a two-stage training approach: domain-agnostic training and domain adaptation training.
arXiv Detail & Related papers (2024-06-14T09:52:27Z) - Multi-Modal Domain Adaptation Across Video Scenes for Temporal Video
Grounding [59.599378814835205]
Temporal Video Grounding (TVG) aims to localize the temporal boundary of a specific segment in an untrimmed video based on a given language query.
We introduce a novel AMDA method to adaptively adjust the model's scene-related knowledge by incorporating insights from the target data.
arXiv Detail & Related papers (2023-12-21T07:49:27Z) - Harnessing the Power of Multi-Lingual Datasets for Pre-training: Towards
Enhancing Text Spotting Performance [15.513912470752041]
The adaptation capability to a wide range of domains is crucial for scene text spotting models when deployed to real-world conditions.
Here, we investigate the problem of domain-adaptive scene text spotting, i.e., training a model on multi-domain source data.
The results clearly demonstrate the potential of intermediate representations to achieve significant performance on text spotting benchmarks across multiple domains.
arXiv Detail & Related papers (2023-10-02T06:08:01Z) - Domain Adaptive Scene Text Detection via Subcategorization [45.580559833129165]
We study domain adaptive scene text detection, a largely neglected yet very meaningful task.
We design SCAST, a subcategory-aware self-training technique that mitigates the network overfitting and noisy pseudo labels.
SCAST achieves superior detection performance consistently across multiple public benchmarks.
arXiv Detail & Related papers (2022-12-01T09:15:43Z) - AFAN: Augmented Feature Alignment Network for Cross-Domain Object
Detection [90.18752912204778]
Unsupervised domain adaptation for object detection is a challenging problem with many real-world applications.
We propose a novel augmented feature alignment network (AFAN) which integrates intermediate domain image generation and domain-adversarial training.
Our approach significantly outperforms the state-of-the-art methods on standard benchmarks for both similar and dissimilar domain adaptations.
arXiv Detail & Related papers (2021-06-10T05:01:20Z) - FDMT: A Benchmark Dataset for Fine-grained Domain Adaptation in Machine
Translation [53.87731008029645]
We present a real-world fine-grained domain adaptation task in machine translation (FDMT)
The FDMT dataset consists of four sub-domains of information technology: autonomous vehicles, AI education, real-time networks and smart phone.
We make quantitative experiments and deep analyses in this new setting, which benchmarks the fine-grained domain adaptation task.
arXiv Detail & Related papers (2020-12-31T17:15:09Z) - Contextual-Relation Consistent Domain Adaptation for Semantic
Segmentation [44.19436340246248]
This paper presents an innovative local contextual-relation consistent domain adaptation technique.
It aims to achieve local-level consistencies during the global-level alignment.
Experiments demonstrate its superior segmentation performance as compared with state-of-the-art methods.
arXiv Detail & Related papers (2020-07-05T19:00:46Z) - Text Recognition in Real Scenarios with a Few Labeled Samples [55.07859517380136]
Scene text recognition (STR) is still a hot research topic in computer vision field.
This paper proposes a few-shot adversarial sequence domain adaptation (FASDA) approach to build sequence adaptation.
Our approach can maximize the character-level confusion between the source domain and the target domain.
arXiv Detail & Related papers (2020-06-22T13:03:01Z) - Spatial Attention Pyramid Network for Unsupervised Domain Adaptation [66.75008386980869]
Unsupervised domain adaptation is critical in various computer vision tasks.
We design a new spatial attention pyramid network for unsupervised domain adaptation.
Our method performs favorably against the state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2020-03-29T09:03:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.