Partial Scene Text Retrieval
- URL: http://arxiv.org/abs/2411.10261v2
- Date: Mon, 18 Nov 2024 14:43:25 GMT
- Title: Partial Scene Text Retrieval
- Authors: Hao Wang, Minghui Liao, Zhouyi Xie, Wenyu Liu, Xiang Bai,
- Abstract summary: The task of partial scene text retrieval involves localizing and searching for text instances that are the same or similar to a given query text from an image gallery.
Existing methods can only handle text-line instances, leaving the problem of searching for partial patches unsolved.
We propose a network that can simultaneously retrieve both text-line instances and their partial patches.
- Score: 56.14891109413448
- License:
- Abstract: The task of partial scene text retrieval involves localizing and searching for text instances that are the same or similar to a given query text from an image gallery. However, existing methods can only handle text-line instances, leaving the problem of searching for partial patches within these text-line instances unsolved due to a lack of patch annotations in the training data. To address this issue, we propose a network that can simultaneously retrieve both text-line instances and their partial patches. Our method embeds the two types of data (query text and scene text instances) into a shared feature space and measures their cross-modal similarities. To handle partial patches, our proposed approach adopts a Multiple Instance Learning (MIL) approach to learn their similarities with query text, without requiring extra annotations. However, constructing bags, which is a standard step of conventional MIL approaches, can introduce numerous noisy samples for training, and lower inference speed. To address this issue, we propose a Ranking MIL (RankMIL) approach to adaptively filter those noisy samples. Additionally, we present a Dynamic Partial Match Algorithm (DPMA) that can directly search for the target partial patch from a text-line instance during the inference stage, without requiring bags. This greatly improves the search efficiency and the performance of retrieving partial patches. The source code and dataset are available at https://github.com/lanfeng4659/PSTR.
Related papers
- Copy Is All You Need [66.00852205068327]
We formulate text generation as progressively copying text segments from an existing text collection.
Our approach achieves better generation quality according to both automatic and human evaluations.
Our approach attains additional performance gains by simply scaling up to larger text collections.
arXiv Detail & Related papers (2023-07-13T05:03:26Z) - DeepSolo++: Let Transformer Decoder with Explicit Points Solo for Multilingual Text Spotting [112.45423990924283]
DeepSolo++ is a simple DETR-like baseline that lets a single decoder with explicit points solo for text detection, recognition, and script identification simultaneously.
Our method not only performs well in English scenes but also masters the transcription with complex font structure and a thousand-level character classes, such as Chinese.
arXiv Detail & Related papers (2023-05-31T15:44:00Z) - Description-Based Text Similarity [59.552704474862004]
We identify the need to search for texts based on abstract descriptions of their content.
We propose an alternative model that significantly improves when used in standard nearest neighbor search.
arXiv Detail & Related papers (2023-05-21T17:14:31Z) - Few Could Be Better Than All: Feature Sampling and Grouping for Scene
Text Detection [47.820683360286786]
We present a transformer-based architecture for scene text detection.
We first select a few representative features at all scales that are highly relevant to foreground text.
As each feature group corresponds to a text instance, its bounding box can be easily obtained without any post-processing operation.
arXiv Detail & Related papers (2022-03-29T04:02:31Z) - Towards End-to-End Unified Scene Text Detection and Layout Analysis [60.68100769639923]
We introduce the task of unified scene text detection and layout analysis.
The first hierarchical scene text dataset is introduced to enable this novel research task.
We also propose a novel method that is able to simultaneously detect scene text and form text clusters in a unified way.
arXiv Detail & Related papers (2022-03-28T23:35:45Z) - Video Text Tracking With a Spatio-Temporal Complementary Model [46.99051486905713]
Text tracking is to track multiple texts in a video,and construct a trajectory for each text.
Existing methodle this task by utilizing the tracking-by-detection frame-work.
We argue that the tracking accuracy of this paradigmis severely limited in more complex scenarios.
arXiv Detail & Related papers (2021-11-09T08:23:06Z) - Unsupervised learning of text line segmentation by differentiating
coarse patterns [0.0]
We present an unsupervised deep learning method that embeds document image patches to a compact Euclidean space where distances correspond to a coarse text line pattern similarity.
Text line segmentation can be easily implemented using standard techniques with the embedded feature vectors.
We evaluate the method qualitatively and quantitatively on several variants of text line segmentation datasets to demonstrate its effectivity.
arXiv Detail & Related papers (2021-05-19T21:21:30Z) - Scene Text Retrieval via Joint Text Detection and Similarity Learning [68.24531728554892]
Scene text retrieval aims to localize and search all text instances from an image gallery, which are the same or similar to a given query text.
We address this problem by directly learning a cross-modal similarity between a query text and each text instance from natural images.
In this way, scene text retrieval can be simply performed by ranking the detected text instances with the learned similarity.
arXiv Detail & Related papers (2021-04-04T07:18:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.