FlowText: Synthesizing Realistic Scene Text Video with Optical Flow
Estimation
- URL: http://arxiv.org/abs/2305.03327v1
- Date: Fri, 5 May 2023 07:15:49 GMT
- Title: FlowText: Synthesizing Realistic Scene Text Video with Optical Flow
Estimation
- Authors: Yuzhong Zhao and Weijia Wu and Zhuang Li and Jiahong Li and Weiqiang
Wang
- Abstract summary: This paper introduces a novel video text synthesis technique called FlowText.
It synthesizes a large amount of text video data at a low cost for training robust video text spotters.
- Score: 23.080145300304018
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Current video text spotting methods can achieve preferable performance,
powered with sufficient labeled training data. However, labeling data manually
is time-consuming and labor-intensive. To overcome this, using low-cost
synthetic data is a promising alternative. This paper introduces a novel video
text synthesis technique called FlowText, which utilizes optical flow
estimation to synthesize a large amount of text video data at a low cost for
training robust video text spotters. Unlike existing methods that focus on
image-level synthesis, FlowText concentrates on synthesizing temporal
information of text instances across consecutive frames using optical flow.
This temporal information is crucial for accurately tracking and spotting text
in video sequences, including text movement, distortion, appearance,
disappearance, shelter, and blur. Experiments show that combining general
detectors like TransDETR with the proposed FlowText produces remarkable results
on various datasets, such as ICDAR2015video and ICDAR2013video. Code is
available at https://github.com/callsys/FlowText.
Related papers
- Text2Data: Low-Resource Data Generation with Textual Control [104.38011760992637]
Natural language serves as a common and straightforward control signal for humans to interact seamlessly with machines.
We propose Text2Data, a novel approach that utilizes unlabeled data to understand the underlying data distribution through an unsupervised diffusion model.
It undergoes controllable finetuning via a novel constraint optimization-based learning objective that ensures controllability and effectively counteracts catastrophic forgetting.
arXiv Detail & Related papers (2024-02-08T03:41:39Z) - Enhancing Scene Text Detectors with Realistic Text Image Synthesis Using
Diffusion Models [63.99110667987318]
We present DiffText, a pipeline that seamlessly blends foreground text with the background's intrinsic features.
With fewer text instances, our produced text images consistently surpass other synthetic data in aiding text detectors.
arXiv Detail & Related papers (2023-11-28T06:51:28Z) - LRANet: Towards Accurate and Efficient Scene Text Detection with
Low-Rank Approximation Network [63.554061288184165]
We propose a novel parameterized text shape method based on low-rank approximation.
By exploring the shape correlation among different text contours, our method achieves consistency, compactness, simplicity, and robustness in shape representation.
We implement an accurate and efficient arbitrary-shaped text detector named LRANet.
arXiv Detail & Related papers (2023-06-27T02:03:46Z) - Video text tracking for dense and small text based on pp-yoloe-r and
sort algorithm [0.9137554315375919]
DSText is 1080 * 1920 and slicing the video frame into several areas will destroy the spatial correlation of text.
For text detection, we adopt the PP-YOLOE-R which is proven effective in small object detection.
For text detection, we use the sort algorithm for high inference speed.
arXiv Detail & Related papers (2023-03-31T05:40:39Z) - RealFlow: EM-based Realistic Optical Flow Dataset Generation from Videos [28.995525297929348]
RealFlow is a framework that can create large-scale optical flow datasets directly from unlabeled realistic videos.
We first estimate optical flow between a pair of video frames, and then synthesize a new image from this pair based on the predicted flow.
Our approach achieves state-of-the-art performance on two standard benchmarks compared with both supervised and unsupervised optical flow methods.
arXiv Detail & Related papers (2022-07-22T13:33:03Z) - Real-time End-to-End Video Text Spotter with Contrastive Representation
Learning [91.15406440999939]
We propose a real-time end-to-end video text spotter with Contrastive Representation learning (CoText)
CoText simultaneously address the three tasks (e.g., text detection, tracking, recognition) in a real-time end-to-end trainable framework.
A simple, lightweight architecture is designed for effective and accurate performance.
arXiv Detail & Related papers (2022-07-18T07:54:17Z) - TEACHTEXT: CrossModal Generalized Distillation for Text-Video Retrieval [103.85002875155551]
We propose a novel generalized distillation method, TeachText, for exploiting large-scale language pretraining.
We extend our method to video side modalities and show that we can effectively reduce the number of used modalities at test time.
Our approach advances the state of the art on several video retrieval benchmarks by a significant margin and adds no computational overhead at test time.
arXiv Detail & Related papers (2021-04-16T17:55:28Z) - Learning optical flow from still images [53.295332513139925]
We introduce a framework to generate accurate ground-truth optical flow annotations quickly and in large amounts from any readily available single real picture.
We virtually move the camera in the reconstructed environment with known motion vectors and rotation angles.
When trained with our data, state-of-the-art optical flow networks achieve superior generalization to unseen real data.
arXiv Detail & Related papers (2021-04-08T17:59:58Z) - Tracking Based Semi-Automatic Annotation for Scene Text Videos [16.286021899032274]
Existing scene text video datasets are not large-scale due to the expensive cost caused by manual labeling.
We get semi-automatic scene text annotation by labeling manually for the first frame and tracking automatically for the subsequent frames.
A paired low-quality scene text video dataset named Text-RBL is proposed, consisting of raw videos, blurry videos, and low-resolution videos.
arXiv Detail & Related papers (2021-03-29T10:42:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.