Learning to Sort Image Sequences via Accumulated Temporal Differences
- URL: http://arxiv.org/abs/2010.11649v1
- Date: Thu, 22 Oct 2020 12:34:05 GMT
- Title: Learning to Sort Image Sequences via Accumulated Temporal Differences
- Authors: Gagan Kanojia and Shanmuganathan Raman
- Abstract summary: We tackle the problem of temporally sequencing the unordered set of images of a dynamic scene captured with a hand-held camera.
We propose a convolutional block which captures the spatial information through 2D convolution kernel.
We show that the proposed approach outperforms the state-of-the-art methods by a significant margin.
- Score: 27.41266294612776
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Consider a set of n images of a scene with dynamic objects captured with a
static or a handheld camera. Let the temporal order in which these images are
captured be unknown. There can be n! possibilities for the temporal order in
which these images could have been captured. In this work, we tackle the
problem of temporally sequencing the unordered set of images of a dynamic scene
captured with a hand-held camera. We propose a convolutional block which
captures the spatial information through 2D convolution kernel and captures the
temporal information by utilizing the differences present among the feature
maps extracted from the input images. We evaluate the performance of the
proposed approach on the dataset extracted from a standard action recognition
dataset, UCF101. We show that the proposed approach outperforms the
state-of-the-art methods by a significant margin. We show that the network
generalizes well by evaluating it on a dataset extracted from the DAVIS
dataset, a dataset meant for video object segmentation, when the same network
was trained with a dataset extracted from UCF101, a dataset meant for action
recognition.
Related papers
- Context Enhanced Transformer for Single Image Object Detection [31.52466523847246]
We propose a novel approach to single image object detection, called Context Enhanced TRansformer (CETR)
To efficiently store temporal information, we construct a class-wise memory that collects contextual information across data.
We present a classification-based sampling technique to selectively utilize the relevant memory for the current image.
arXiv Detail & Related papers (2023-12-22T07:40:43Z) - Learning Sequence Descriptor based on Spatio-Temporal Attention for
Visual Place Recognition [16.380948630155476]
Visual Place Recognition (VPR) aims to retrieve frames from atagged database that are located at the same place as the query frame.
To improve the robustness of VPR in geoly aliasing scenarios, sequence-based VPR methods are proposed.
We use a sliding window to control the temporal range of attention and use relative positional encoding to construct sequential relationships between different features.
arXiv Detail & Related papers (2023-05-19T06:39:10Z) - Event-Based Frame Interpolation with Ad-hoc Deblurring [68.97825675372354]
We propose a general method for event-based frame that performs deblurring ad-hoc on input videos.
Our network consistently outperforms state-of-the-art methods on frame, single image deblurring and the joint task of deblurring.
Our code and dataset will be made publicly available.
arXiv Detail & Related papers (2023-01-12T18:19:00Z) - Prefix Conditioning Unifies Language and Label Supervision [84.11127588805138]
We show that dataset biases negatively affect pre-training by reducing the generalizability of learned representations.
In experiments, we show that this simple technique improves the performance in zero-shot image recognition accuracy and robustness to the image-level distribution shift.
arXiv Detail & Related papers (2022-06-02T16:12:26Z) - Complex Scene Image Editing by Scene Graph Comprehension [17.72638225034884]
We propose a two-stage method for achieving complex scene image editing by Scene Graph (SGC-Net)
In the first stage, we train a Region of Interest (RoI) prediction network that uses scene graphs and predict the locations of the target objects.
The second stage uses a conditional diffusion model to edit the image based on our RoI predictions.
arXiv Detail & Related papers (2022-03-24T05:12:54Z) - Learning Co-segmentation by Segment Swapping for Retrieval and Discovery [67.6609943904996]
The goal of this work is to efficiently identify visually similar patterns from a pair of images.
We generate synthetic training pairs by selecting object segments in an image and copy-pasting them into another image.
We show our approach provides clear improvements for artwork details retrieval on the Brueghel dataset.
arXiv Detail & Related papers (2021-10-29T16:51:16Z) - Composable Augmentation Encoding for Video Representation Learning [94.2358972764708]
We focus on contrastive methods for self-supervised video representation learning.
A common paradigm in contrastive learning is to construct positive pairs by sampling different data views for the same instance, with different data instances as negatives.
We propose an 'augmentation aware' contrastive learning framework, where we explicitly provide a sequence of augmentation parameterisations.
We show that our method encodes valuable information about specified spatial or temporal augmentation, and in doing so also achieve state-of-the-art performance on a number of video benchmarks.
arXiv Detail & Related papers (2021-04-01T16:48:53Z) - Plotting time: On the usage of CNNs for time series classification [1.0390583509657398]
We present a novel approach for time series classification where we represent time series data as plot images and feed them to a simple CNN.
Our approach is very promising, achieving the best results on both real-world datasets and matching / beating the best state-of-the-art methods in six UCR datasets.
arXiv Detail & Related papers (2021-02-08T13:23:01Z) - Self-supervised Segmentation via Background Inpainting [96.10971980098196]
We introduce a self-supervised detection and segmentation approach that can work with single images captured by a potentially moving camera.
We exploit a self-supervised loss function that we exploit to train a proposal-based segmentation network.
We apply our method to human detection and segmentation in images that visually depart from those of standard benchmarks and outperform existing self-supervised methods.
arXiv Detail & Related papers (2020-11-11T08:34:40Z) - Asynchronous Tracking-by-Detection on Adaptive Time Surfaces for
Event-based Object Tracking [87.0297771292994]
We propose an Event-based Tracking-by-Detection (ETD) method for generic bounding box-based object tracking.
To achieve this goal, we present an Adaptive Time-Surface with Linear Time Decay (ATSLTD) event-to-frame conversion algorithm.
We compare the proposed ETD method with seven popular object tracking methods, that are based on conventional cameras or event cameras, and two variants of ETD.
arXiv Detail & Related papers (2020-02-13T15:58:31Z) - Virtual KITTI 2 [13.390646987475163]
This paper introduces an updated version of the well-known Virtual KITTI dataset.
The dataset consists of 5 sequence clones from the KITTI tracking benchmark.
For each sequence, we provide multiple sets of images containing RGB, depth, class segmentation, instance segmentation, flow, and scene flow data.
arXiv Detail & Related papers (2020-01-29T12:13:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.