Text Recognition in Real Scenarios with a Few Labeled Samples
- URL: http://arxiv.org/abs/2006.12209v1
- Date: Mon, 22 Jun 2020 13:03:01 GMT
- Title: Text Recognition in Real Scenarios with a Few Labeled Samples
- Authors: Jinghuang Lin, Zhanzhan Cheng, Fan Bai, Yi Niu, Shiliang Pu, Shuigeng
Zhou
- Abstract summary: Scene text recognition (STR) is still a hot research topic in computer vision field.
This paper proposes a few-shot adversarial sequence domain adaptation (FASDA) approach to build sequence adaptation.
Our approach can maximize the character-level confusion between the source domain and the target domain.
- Score: 55.07859517380136
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Scene text recognition (STR) is still a hot research topic in computer vision
field due to its various applications. Existing works mainly focus on learning
a general model with a huge number of synthetic text images to recognize
unconstrained scene texts, and have achieved substantial progress. However,
these methods are not quite applicable in many real-world scenarios where 1)
high recognition accuracy is required, while 2) labeled samples are lacked. To
tackle this challenging problem, this paper proposes a few-shot adversarial
sequence domain adaptation (FASDA) approach to build sequence adaptation
between the synthetic source domain (with many synthetic labeled samples) and a
specific target domain (with only some or a few real labeled samples). This is
done by simultaneously learning each character's feature representation with an
attention mechanism and establishing the corresponding character-level latent
subspace with adversarial learning. Our approach can maximize the
character-level confusion between the source domain and the target domain, thus
achieves the sequence-level adaptation with even a small number of labeled
samples in the target domain. Extensive experiments on various datasets show
that our method significantly outperforms the finetuning scheme, and obtains
comparable performance to the state-of-the-art STR methods.
Related papers
- Downstream-Pretext Domain Knowledge Traceback for Active Learning [138.02530777915362]
We propose a downstream-pretext domain knowledge traceback (DOKT) method that traces the data interactions of downstream knowledge and pre-training guidance.
DOKT consists of a traceback diversity indicator and a domain-based uncertainty estimator.
Experiments conducted on ten datasets show that our model outperforms other state-of-the-art methods.
arXiv Detail & Related papers (2024-07-20T01:34:13Z) - Sequential Visual and Semantic Consistency for Semi-supervised Text
Recognition [56.968108142307976]
Scene text recognition (STR) is a challenging task that requires large-scale annotated data for training.
Most existing STR methods resort to synthetic data, which may introduce domain discrepancy and degrade the performance of STR models.
This paper proposes a novel semi-supervised learning method for STR that incorporates word-level consistency regularization from both visual and semantic aspects.
arXiv Detail & Related papers (2024-02-24T13:00:54Z) - Harnessing the Power of Beta Scoring in Deep Active Learning for
Multi-Label Text Classification [6.662167018900634]
Our study introduces a novel deep active learning strategy, capitalizing on the Beta family of proper scoring rules within the Expected Loss Reduction framework.
It computes the expected increase in scores using the Beta Scoring Rules, which are then transformed into sample vector representations.
Comprehensive evaluations across both synthetic and real datasets reveal our method's capability to often outperform established acquisition techniques in multi-label text classification.
arXiv Detail & Related papers (2024-01-15T00:06:24Z) - Diversified in-domain synthesis with efficient fine-tuning for few-shot
classification [64.86872227580866]
Few-shot image classification aims to learn an image classifier using only a small set of labeled examples per class.
We propose DISEF, a novel approach which addresses the generalization challenge in few-shot learning using synthetic data.
We validate our method in ten different benchmarks, consistently outperforming baselines and establishing a new state-of-the-art for few-shot classification.
arXiv Detail & Related papers (2023-12-05T17:18:09Z) - Harnessing the Power of Multi-Lingual Datasets for Pre-training: Towards
Enhancing Text Spotting Performance [15.513912470752041]
The adaptation capability to a wide range of domains is crucial for scene text spotting models when deployed to real-world conditions.
Here, we investigate the problem of domain-adaptive scene text spotting, i.e., training a model on multi-domain source data.
The results clearly demonstrate the potential of intermediate representations to achieve significant performance on text spotting benchmarks across multiple domains.
arXiv Detail & Related papers (2023-10-02T06:08:01Z) - Diving into the Depths of Spotting Text in Multi-Domain Noisy Scenes [11.478236584340255]
We present a text spotting validation benchmark called Under-Water Text (UWT) for noisy underwater scenes.
We also design an efficient super-resolution based end-to-end transformer baseline called DA-TextSpotter.
The dataset, code and pre-trained models will be released upon acceptance.
arXiv Detail & Related papers (2023-10-01T03:27:41Z) - Improving Anomaly Segmentation with Multi-Granularity Cross-Domain
Alignment [17.086123737443714]
Anomaly segmentation plays a pivotal role in identifying atypical objects in images, crucial for hazard detection in autonomous driving systems.
While existing methods demonstrate noteworthy results on synthetic data, they often fail to consider the disparity between synthetic and real-world data domains.
We introduce the Multi-Granularity Cross-Domain Alignment framework, tailored to harmonize features across domains at both the scene and individual sample levels.
arXiv Detail & Related papers (2023-08-16T22:54:49Z) - Semi-Supervised Domain Adaptation with Prototypical Alignment and
Consistency Learning [86.6929930921905]
This paper studies how much it can help address domain shifts if we further have a few target samples labeled.
To explore the full potential of landmarks, we incorporate a prototypical alignment (PA) module which calculates a target prototype for each class from the landmarks.
Specifically, we severely perturb the labeled images, making PA non-trivial to achieve and thus promoting model generalizability.
arXiv Detail & Related papers (2021-04-19T08:46:08Z) - Cross-domain Object Detection through Coarse-to-Fine Feature Adaptation [62.29076080124199]
This paper proposes a novel coarse-to-fine feature adaptation approach to cross-domain object detection.
At the coarse-grained stage, foreground regions are extracted by adopting the attention mechanism, and aligned according to their marginal distributions.
At the fine-grained stage, we conduct conditional distribution alignment of foregrounds by minimizing the distance of global prototypes with the same category but from different domains.
arXiv Detail & Related papers (2020-03-23T13:40:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.