Rooms with Text: A Dataset for Overlaying Text Detection
- URL: http://arxiv.org/abs/2211.11350v1
- Date: Mon, 21 Nov 2022 11:04:41 GMT
- Title: Rooms with Text: A Dataset for Overlaying Text Detection
- Authors: Oleg Smirnov, Aditya Tewari
- Abstract summary: We introduce a new dataset of room interior pictures with overlaying and scene text, totalling to 4836 annotated images in 25 product categories.
We propose a baseline method for overlaying text detection, that leverages the character region-aware text detection framework to guide the classification model.
- Score: 0.18275108630751835
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: In this paper, we introduce a new dataset of room interior pictures with
overlaying and scene text, totalling to 4836 annotated images in 25 product
categories. We provide details on the collection and annotation process of our
dataset, and analyze its statistics. Furthermore, we propose a baseline method
for overlaying text detection, that leverages the character region-aware text
detection framework to guide the classification model. We validate our approach
and show its efficiency in terms of binary classification metrics, reaching the
final performance of 0.95 F1 score, with false positive and false negative
rates of 0.02 and 0.06 correspondingly.
Related papers
- ClapperText: A Benchmark for Text Recognition in Low-Resource Archival Documents [1.2875548392688383]
ClapperText is a benchmark dataset for handwritten and printed text recognition in visually degraded and low-resource settings.<n>The dataset is derived from 127 World War II-era archival video segments containing clapperboards.<n>Recognizing clapperboard text poses significant challenges, including motion blur, handwriting variation, exposure fluctuations, and cluttered backgrounds.
arXiv Detail & Related papers (2025-10-17T11:44:08Z) - AnnoPage Dataset: Dataset of Non-Textual Elements in Documents with Fine-Grained Categorization [0.0]
The AnnoPage dataset is a collection of 7550 pages from historical documents, primarily in Czech and German, spanning from 1485 to the present.
The dataset is designed to support research in document layout analysis and object detection.
arXiv Detail & Related papers (2025-03-28T15:30:42Z) - Efficiently Leveraging Linguistic Priors for Scene Text Spotting [63.22351047545888]
This paper proposes a method that leverages linguistic knowledge from a large text corpus to replace the traditional one-hot encoding used in auto-regressive scene text spotting and recognition models.
We generate text distributions that align well with scene text datasets, removing the need for in-domain fine-tuning.
Experimental results show that our method not only improves recognition accuracy but also enables more accurate localization of words.
arXiv Detail & Related papers (2024-02-27T01:57:09Z) - Toward Real Text Manipulation Detection: New Dataset and New Solution [58.557504531896704]
High costs associated with professional text manipulation limit the availability of real-world datasets.
We present the Real Text Manipulation dataset, encompassing 14,250 text images.
Our contributions aim to propel advancements in real-world text tampering detection.
arXiv Detail & Related papers (2023-12-12T02:10:16Z) - SpaText: Spatio-Textual Representation for Controllable Image Generation [61.89548017729586]
SpaText is a new method for text-to-image generation using open-vocabulary scene control.
In addition to a global text prompt that describes the entire scene, the user provides a segmentation map.
We show its effectiveness on two state-of-the-art diffusion models: pixel-based and latent-conditional-based.
arXiv Detail & Related papers (2022-11-25T18:59:10Z) - ContextCLIP: Contextual Alignment of Image-Text pairs on CLIP visual
representations [4.588028371034406]
We propose ContextCLIP, a contextual and contrastive learning framework for the contextual alignment of image-text pairs.
Our framework was observed to improve the image-text alignment by aligning text and image representations contextually in the joint embedding space.
ContextCLIP showed good qualitative performance for text-to-image retrieval tasks and enhanced classification accuracy.
arXiv Detail & Related papers (2022-11-14T05:17:51Z) - PART: Pre-trained Authorship Representation Transformer [52.623051272843426]
Authors writing documents imprint identifying information within their texts.<n>Previous works use hand-crafted features or classification tasks to train their authorship models.<n>We propose a contrastively trained model fit to learn textbfauthorship embeddings instead of semantics.
arXiv Detail & Related papers (2022-09-30T11:08:39Z) - Reading and Writing: Discriminative and Generative Modeling for
Self-Supervised Text Recognition [101.60244147302197]
We introduce contrastive learning and masked image modeling to learn discrimination and generation of text images.
Our method outperforms previous self-supervised text recognition methods by 10.2%-20.2% on irregular scene text recognition datasets.
Our proposed text recognizer exceeds previous state-of-the-art text recognition methods by averagely 5.3% on 11 benchmarks, with similar model size.
arXiv Detail & Related papers (2022-07-01T03:50:26Z) - Towards End-to-End Unified Scene Text Detection and Layout Analysis [60.68100769639923]
We introduce the task of unified scene text detection and layout analysis.
The first hierarchical scene text dataset is introduced to enable this novel research task.
We also propose a novel method that is able to simultaneously detect scene text and form text clusters in a unified way.
arXiv Detail & Related papers (2022-03-28T23:35:45Z) - On Exploring and Improving Robustness of Scene Text Detection Models [20.15225372544634]
We evaluate scene text detection models ICDAR2015-C (IC15-C) and CTW1500-C (CTW-C)
We perform a robustness analysis of six key components: pre-training data, backbone, feature fusion module, multi-scale predictions, representation of text instances and loss function.
We present a simple yet effective data-based method to destroy the smoothness of text regions by merging background and foreground.
arXiv Detail & Related papers (2021-10-12T02:36:48Z) - Czech News Dataset for Semantic Textual Similarity [0.0]
This paper describes a novel dataset consisting of sentences with semantic similarity annotations.
The data originate from the journalistic domain in the Czech language.
The dataset contains 138,556 human annotations divided into train and test sets.
arXiv Detail & Related papers (2021-08-19T14:20:17Z) - Comprehensive Studies for Arbitrary-shape Scene Text Detection [78.50639779134944]
We propose a unified framework for the bottom-up based scene text detection methods.
Under the unified framework, we ensure the consistent settings for non-core modules.
With the comprehensive investigations and elaborate analyses, it reveals the advantages and disadvantages of previous models.
arXiv Detail & Related papers (2021-07-25T13:18:55Z) - Line Segmentation from Unconstrained Handwritten Text Images using
Adaptive Approach [10.436029791699777]
Line segmentation from handwritten text images is a challenging task due to diversity and unknown variations.
An adaptive approach is used for the line segmentation from handwritten text images merging the alignment of connected component coordinates and text height.
The proposed scheme is tested on two different type of datasets; document pages having base lines and plain pages.
arXiv Detail & Related papers (2021-04-18T08:52:52Z) - Text Recognition -- Real World Data and Where to Find Them [36.10220484561196]
We present a method for exploiting weakly annotated images to improve text extraction pipelines.
The approach uses an arbitrary end-to-end text recognition system to obtain text region proposals and their, possibly erroneous, transcriptions.
It produces nearly error-free, localised instances of scene text, which we treat as "pseudo ground truth" (PGT)
arXiv Detail & Related papers (2020-07-06T22:23:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.