It's About Time: Analog Clock Reading in the Wild
- URL: http://arxiv.org/abs/2111.09162v1
- Date: Wed, 17 Nov 2021 14:52:02 GMT
- Title: It's About Time: Analog Clock Reading in the Wild
- Authors: Charig Yang, Weidi Xie, Andrew Zisserman
- Abstract summary: We present a framework for reading analog clocks in natural images or videos.
We create a scalable pipeline for generating synthetic clocks, significantly reducing the requirements for the labour-intensive annotations.
We show that the model trained on the proposed synthetic dataset generalises towards real clocks with good accuracy.
- Score: 93.84801062680786
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we present a framework for reading analog clocks in natural
images or videos. Specifically, we make the following contributions: First, we
create a scalable pipeline for generating synthetic clocks, significantly
reducing the requirements for the labour-intensive annotations; Second, we
introduce a clock recognition architecture based on spatial transformer
networks (STN), which is trained end-to-end for clock alignment and
recognition. We show that the model trained on the proposed synthetic dataset
generalises towards real clocks with good accuracy, advocating a Sim2Real
training regime; Third, to further reduce the gap between simulation and real
data, we leverage the special property of time, i.e. uniformity, to generate
reliable pseudo-labels on real unlabelled clock videos, and show that training
on these videos offers further improvements while still requiring zero manual
annotations. Lastly, we introduce three benchmark datasets based on COCO, Open
Images, and The Clock movie, totalling 4,472 images with clocks, with full
annotations for time, accurate to the minute.
Related papers
- Ticking clocks in quantum theory [0.0]
We show that for finite systems a single natural principle serves to distinguish what we understand as ticking clocks from time-keeping systems in general.
We describe the most general form of the dynamics of such a clock, and discuss the additional simplifications to go from a general ticking clock to models encountered in literature.
arXiv Detail & Related papers (2023-06-02T18:00:01Z) - TimeMAE: Self-Supervised Representations of Time Series with Decoupled
Masked Autoencoders [55.00904795497786]
We propose TimeMAE, a novel self-supervised paradigm for learning transferrable time series representations based on transformer networks.
The TimeMAE learns enriched contextual representations of time series with a bidirectional encoding scheme.
To solve the discrepancy issue incurred by newly injected masked embeddings, we design a decoupled autoencoder architecture.
arXiv Detail & Related papers (2023-03-01T08:33:16Z) - Time Series Forecasting via Semi-Asymmetric Convolutional Architecture
with Global Atrous Sliding Window [0.0]
The proposed method in this paper is designed to address the problem of time series forecasting.
Most of modern models only focus on a short range of information, which are fatal for problems such as time series forecasting.
We make three main contributions that are experimentally verified to have performance advantages.
arXiv Detail & Related papers (2023-01-31T15:07:31Z) - ViTs for SITS: Vision Transformers for Satellite Image Time Series [52.012084080257544]
We introduce a fully-attentional model for general Satellite Image Time Series (SITS) processing based on the Vision Transformer (ViT)
TSViT splits a SITS record into non-overlapping patches in space and time which are tokenized and subsequently processed by a factorized temporo-spatial encoder.
arXiv Detail & Related papers (2023-01-12T11:33:07Z) - Fast Non-Rigid Radiance Fields from Monocularized Data [66.74229489512683]
This paper proposes a new method for full 360deg inward-facing novel view synthesis of non-rigidly deforming scenes.
At the core of our method are 1) An efficient deformation module that decouples the processing of spatial and temporal information for accelerated training and inference; and 2) A static module representing the canonical scene as a fast hash-encoded neural radiance field.
In both cases, our method is significantly faster than previous methods, converging in less than 7 minutes and achieving real-time framerates at 1K resolution, while obtaining a higher visual accuracy for generated novel views.
arXiv Detail & Related papers (2022-12-02T18:51:10Z) - SimulLR: Simultaneous Lip Reading Transducer with Attention-Guided
Adaptive Memory [61.44510300515693]
We study the task of simultaneous lip and devise SimulLR, a simultaneous lip Reading transducer with attention-guided adaptive memory.
The experiments show that the SimulLR achieves the translation speedup 9.10 times times compared with the state-of-the-art non-simultaneous methods.
arXiv Detail & Related papers (2021-08-31T05:54:16Z) - Temporal-Spatial Feature Pyramid for Video Saliency Detection [2.578242050187029]
We propose a 3D fully convolutional encoder-decoder architecture for video saliency detection.
Our model is simple yet effective, and can run in real time.
arXiv Detail & Related papers (2021-05-10T09:14:14Z) - Is Space-Time Attention All You Need for Video Understanding? [50.78676438502343]
We present a convolution-free approach to built exclusively on self-attention over space and time.
"TimeSformer" adapts the standard Transformer architecture to video by enabling feature learning from a sequence of frame-level patches.
TimeSformer achieves state-of-the-art results on several major action recognition benchmarks.
arXiv Detail & Related papers (2021-02-09T19:49:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.