A New Unified Method for Detecting Text from Marathon Runners and Sports
Players in Video
- URL: http://arxiv.org/abs/2005.12524v1
- Date: Tue, 26 May 2020 05:54:28 GMT
- Title: A New Unified Method for Detecting Text from Marathon Runners and Sports
Players in Video
- Authors: Sauradip Nag, Palaiahnakote Shivakumara, Umapada Pal, Tong Lu and
Michael Blumenstein
- Abstract summary: The proposed method fuses gradient magnitude and direction coherence of text pixels in a new way for detecting candidate regions.
Based on skin information, the proposed method then detects faces and torsos by finding structural and spatial coherences.
A comparative study with the state-of-the-art methods on bib number/text detection of different datasets shows that the proposed method outperforms the existing methods.
- Score: 37.86508176161514
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Detecting text located on the torsos of marathon runners and sports players
in video is a challenging issue due to poor quality and adverse effects caused
by flexible/colorful clothing, and different structures of human bodies or
actions. This paper presents a new unified method for tackling the above
challenges. The proposed method fuses gradient magnitude and direction
coherence of text pixels in a new way for detecting candidate regions.
Candidate regions are used for determining the number of temporal frame
clusters obtained by K-means clustering on frame differences. This process in
turn detects key frames. The proposed method explores Bayesian probability for
skin portions using color values at both pixel and component levels of temporal
frames, which provides fused images with skin components. Based on skin
information, the proposed method then detects faces and torsos by finding
structural and spatial coherences between them. We further propose adaptive
pixels linking a deep learning model for text detection from torso regions. The
proposed method is tested on our own dataset collected from marathon/sports
video and three standard datasets, namely, RBNR, MMM and R-ID of marathon
images, to evaluate the performance. In addition, the proposed method is also
tested on the standard natural scene datasets, namely, CTW1500 and MS-COCO text
datasets, to show the objectiveness of the proposed method. A comparative study
with the state-of-the-art methods on bib number/text detection of different
datasets shows that the proposed method outperforms the existing methods.
Related papers
- Focus Entirety and Perceive Environment for Arbitrary-Shaped Text Detection [31.180352896153682]
segmentation-based approaches have emerged as prominent contenders owing to their flexible pixel-level predictions.
We propose a multi-information level arbitrary-shaped text detector consisting of a focus entirety module and a perceive environment module.
The latter extracts region-level information and encourages the model to focus on the distribution of positive samples in the vicinity of a pixel.
arXiv Detail & Related papers (2024-09-25T11:24:37Z) - Spotlight Text Detector: Spotlight on Candidate Regions Like a Camera [31.180352896153682]
We propose an effective spotlight text detector (STD) for scene texts.
It consists of a spotlight calibration module (SCM) and a multivariate information extraction module (MIEM)
Our STD is superior to existing state-of-the-art methods on various datasets.
arXiv Detail & Related papers (2024-09-25T11:19:09Z) - CLIPC8: Face liveness detection algorithm based on image-text pairs and
contrastive learning [3.90443799528247]
We propose a face liveness detection method based on image-text pairs and contrastive learning.
The proposed method is capable of effectively detecting specific liveness attack behaviors in certain scenarios.
It is also effective in detecting traditional liveness attack methods, such as printing photo attacks and screen remake attacks.
arXiv Detail & Related papers (2023-11-29T12:21:42Z) - Enhanced Sharp-GAN For Histopathology Image Synthesis [63.845552349914186]
Histopathology image synthesis aims to address the data shortage issue in training deep learning approaches for accurate cancer detection.
We propose a novel approach that enhances the quality of synthetic images by using nuclei topology and contour regularization.
The proposed approach outperforms Sharp-GAN in all four image quality metrics on two datasets.
arXiv Detail & Related papers (2023-01-24T17:54:01Z) - SpaText: Spatio-Textual Representation for Controllable Image Generation [61.89548017729586]
SpaText is a new method for text-to-image generation using open-vocabulary scene control.
In addition to a global text prompt that describes the entire scene, the user provides a segmentation map.
We show its effectiveness on two state-of-the-art diffusion models: pixel-based and latent-conditional-based.
arXiv Detail & Related papers (2022-11-25T18:59:10Z) - Probabilistic Deep Metric Learning for Hyperspectral Image
Classification [91.5747859691553]
This paper proposes a probabilistic deep metric learning framework for hyperspectral image classification.
It aims to predict the category of each pixel for an image captured by hyperspectral sensors.
Our framework can be readily applied to existing hyperspectral image classification methods.
arXiv Detail & Related papers (2022-11-15T17:57:12Z) - Arbitrary Shape Text Detection using Transformers [2.294014185517203]
We propose an end-to-end trainable architecture for arbitrary-shaped text detection using Transformers (DETR)
At its core, our proposed method leverages a bounding box loss function that accurately measures the arbitrary detected text regions' changes in scale and aspect ratio.
We evaluate our proposed model using Total-Text and CTW-1500 datasets for curved text, and MSRA-TD500 and ICDAR15 datasets for multi-oriented text.
arXiv Detail & Related papers (2022-02-22T22:36:29Z) - Shot boundary detection method based on a new extensive dataset and
mixed features [68.8204255655161]
Shot boundary detection in video is one of the key stages of video data processing.
New method for shot boundary detection based on several video features, such as color histograms and object boundaries, has been proposed.
arXiv Detail & Related papers (2021-09-02T16:19:24Z) - UC-Net: Uncertainty Inspired RGB-D Saliency Detection via Conditional
Variational Autoencoders [81.5490760424213]
We propose the first framework (UCNet) to employ uncertainty for RGB-D saliency detection by learning from the data labeling process.
Inspired by the saliency data labeling process, we propose probabilistic RGB-D saliency detection network.
arXiv Detail & Related papers (2020-04-13T04:12:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.