Related papers: Diving into the Depths of Spotting Text in Multi-Domain Noisy Scenes

Diving into the Depths of Spotting Text in Multi-Domain Noisy Scenes

URL: http://arxiv.org/abs/2310.00558v3
Date: Sat, 17 Feb 2024 14:10:25 GMT
Title: Diving into the Depths of Spotting Text in Multi-Domain Noisy Scenes
Authors: Alloy Das, Sanket Biswas, Umapada Pal and Josep Llad\'os
Abstract summary: We present a text spotting validation benchmark called Under-Water Text (UWT) for noisy underwater scenes. We also design an efficient super-resolution based end-to-end transformer baseline called DA-TextSpotter. The dataset, code and pre-trained models will be released upon acceptance.
Score: 11.478236584340255
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: When used in a real-world noisy environment, the capacity to generalize to multiple domains is essential for any autonomous scene text spotting system. However, existing state-of-the-art methods employ pretraining and fine-tuning strategies on natural scene datasets, which do not exploit the feature interaction across other complex domains. In this work, we explore and investigate the problem of domain-agnostic scene text spotting, i.e., training a model on multi-domain source data such that it can directly generalize to target domains rather than being specialized for a specific domain or scenario. In this regard, we present the community a text spotting validation benchmark called Under-Water Text (UWT) for noisy underwater scenes to establish an important case study. Moreover, we also design an efficient super-resolution based end-to-end transformer baseline called DA-TextSpotter which achieves comparable or superior performance over existing text spotting architectures for both regular and arbitrary-shaped scene text spotting benchmarks in terms of both accuracy and model efficiency. The dataset, code and pre-trained models will be released upon acceptance.

Related papers

GlocalCLIP: Object-agnostic Global-Local Prompt Learning for Zero-shot Anomaly Detection [5.530212768657544]
We introduce glocal contrastive learning to improve the learning of global and local prompts, effectively detecting abnormal patterns across various domains. The generalization performance of GlocalCLIP in ZSAD was demonstrated on 15 real-world datasets from both the industrial and medical domains.
arXiv Detail & Related papers (2024-11-09T05:22:13Z)
A Unified Data Augmentation Framework for Low-Resource Multi-Domain Dialogue Generation [52.0964459842176]
Current state-of-the-art dialogue systems heavily rely on extensive training datasets. We propose a novel data textbfAugmentation framework for textbfMulti-textbfDomain textbfDialogue textbfGeneration, referred to as textbfAMD$2$G. The AMD$2$G framework consists of a data augmentation process and a two-stage training approach: domain-agnostic training and domain adaptation training.
arXiv Detail & Related papers (2024-06-14T09:52:27Z)
Multi-Modal Domain Adaptation Across Video Scenes for Temporal Video Grounding [59.599378814835205]
Temporal Video Grounding (TVG) aims to localize the temporal boundary of a specific segment in an untrimmed video based on a given language query. We introduce a novel AMDA method to adaptively adjust the model's scene-related knowledge by incorporating insights from the target data.
arXiv Detail & Related papers (2023-12-21T07:49:27Z)
Harnessing the Power of Multi-Lingual Datasets for Pre-training: Towards Enhancing Text Spotting Performance [15.513912470752041]
The adaptation capability to a wide range of domains is crucial for scene text spotting models when deployed to real-world conditions. Here, we investigate the problem of domain-adaptive scene text spotting, i.e., training a model on multi-domain source data. The results clearly demonstrate the potential of intermediate representations to achieve significant performance on text spotting benchmarks across multiple domains.
arXiv Detail & Related papers (2023-10-02T06:08:01Z)
Domain Adaptive Scene Text Detection via Subcategorization [45.580559833129165]
We study domain adaptive scene text detection, a largely neglected yet very meaningful task. We design SCAST, a subcategory-aware self-training technique that mitigates the network overfitting and noisy pseudo labels. SCAST achieves superior detection performance consistently across multiple public benchmarks.
arXiv Detail & Related papers (2022-12-01T09:15:43Z)
AFAN: Augmented Feature Alignment Network for Cross-Domain Object Detection [90.18752912204778]
Unsupervised domain adaptation for object detection is a challenging problem with many real-world applications. We propose a novel augmented feature alignment network (AFAN) which integrates intermediate domain image generation and domain-adversarial training. Our approach significantly outperforms the state-of-the-art methods on standard benchmarks for both similar and dissimilar domain adaptations.
arXiv Detail & Related papers (2021-06-10T05:01:20Z)
Inferring Latent Domains for Unsupervised Deep Domain Adaptation [54.963823285456925]
Unsupervised Domain Adaptation (UDA) refers to the problem of learning a model in a target domain where labeled data are not available. This paper introduces a novel deep architecture which addresses the problem of UDA by automatically discovering latent domains in visual datasets. We evaluate our approach on publicly available benchmarks, showing that it outperforms state-of-the-art domain adaptation methods.
arXiv Detail & Related papers (2021-03-25T14:33:33Z)
FDMT: A Benchmark Dataset for Fine-grained Domain Adaptation in Machine Translation [53.87731008029645]
We present a real-world fine-grained domain adaptation task in machine translation (FDMT) The FDMT dataset consists of four sub-domains of information technology: autonomous vehicles, AI education, real-time networks and smart phone. We make quantitative experiments and deep analyses in this new setting, which benchmarks the fine-grained domain adaptation task.
arXiv Detail & Related papers (2020-12-31T17:15:09Z)
Contextual-Relation Consistent Domain Adaptation for Semantic Segmentation [44.19436340246248]
This paper presents an innovative local contextual-relation consistent domain adaptation technique. It aims to achieve local-level consistencies during the global-level alignment. Experiments demonstrate its superior segmentation performance as compared with state-of-the-art methods.
arXiv Detail & Related papers (2020-07-05T19:00:46Z)
Text Recognition in Real Scenarios with a Few Labeled Samples [55.07859517380136]
Scene text recognition (STR) is still a hot research topic in computer vision field. This paper proposes a few-shot adversarial sequence domain adaptation (FASDA) approach to build sequence adaptation. Our approach can maximize the character-level confusion between the source domain and the target domain.
arXiv Detail & Related papers (2020-06-22T13:03:01Z)
Spatial Attention Pyramid Network for Unsupervised Domain Adaptation [66.75008386980869]
Unsupervised domain adaptation is critical in various computer vision tasks. We design a new spatial attention pyramid network for unsupervised domain adaptation. Our method performs favorably against the state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2020-03-29T09:03:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.