Rotation Equivariant Siamese Networks for Tracking
- URL: http://arxiv.org/abs/2012.13078v1
- Date: Thu, 24 Dec 2020 03:06:47 GMT
- Title: Rotation Equivariant Siamese Networks for Tracking
- Authors: Deepak K. Gupta, Devanshu Arya and Efstratios Gavves
- Abstract summary: We present rotation-equivariant Siamese networks (RE-SiamNets) for object tracking.
SiamNets allow estimating the change in orientation of the object in an unsupervised manner.
We show that RE-SiamNets handle the problem of rotation very well and out-perform their regular counterparts.
- Score: 26.8787636300794
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Rotation is among the long prevailing, yet still unresolved, hard challenges
encountered in visual object tracking. The existing deep learning-based
tracking algorithms use regular CNNs that are inherently translation
equivariant, but not designed to tackle rotations. In this paper, we first
demonstrate that in the presence of rotation instances in videos, the
performance of existing trackers is severely affected. To circumvent the
adverse effect of rotations, we present rotation-equivariant Siamese networks
(RE-SiamNets), built through the use of group-equivariant convolutional layers
comprising steerable filters. SiamNets allow estimating the change in
orientation of the object in an unsupervised manner, thereby facilitating its
use in relative 2D pose estimation as well. We further show that this change in
orientation can be used to impose an additional motion constraint in Siamese
tracking through imposing restriction on the change in orientation between two
consecutive frames. For benchmarking, we present Rotation Tracking Benchmark
(RTB), a dataset comprising a set of videos with rotation instances. Through
experiments on two popular Siamese architectures, we show that RE-SiamNets
handle the problem of rotation very well and out-perform their regular
counterparts. Further, RE-SiamNets can accurately estimate the relative change
in pose of the target in an unsupervised fashion, namely the in-plane rotation
the target has sustained with respect to the reference frame.
Related papers
- SBDet: A Symmetry-Breaking Object Detector via Relaxed Rotation-Equivariance [26.05910177212846]
Group Equivariant Convolution (GConv) empowers models to explore symmetries hidden in visual data, improving their performance.
Traditional GConv methods are limited by the strict operation rules in the group space, making it difficult to adapt to Symmetry-Breaking or non-rigid transformations.
We propose a novel Relaxed Rotation-Equivariant Network (R2Net) as the backbone and further develop the Symmetry-Breaking Object Detector (SBDet) for 2D object detection built upon it.
arXiv Detail & Related papers (2024-08-21T16:32:03Z) - PARE-Net: Position-Aware Rotation-Equivariant Networks for Robust Point Cloud Registration [8.668461141536383]
Learning rotation-invariant distinctive features is a fundamental requirement for point cloud registration.
Existing methods often use rotation-sensitive networks to extract features, while employing rotation augmentation to learn an approximate invariant mapping rudely.
We propose a novel position-aware rotation-equivariant network, for efficient, light-weighted, and robust registration.
arXiv Detail & Related papers (2024-07-14T10:26:38Z) - VI-Net: Boosting Category-level 6D Object Pose Estimation via Learning
Decoupled Rotations on the Spherical Representations [55.25238503204253]
We propose a novel rotation estimation network, termed as VI-Net, to make the task easier.
To process the spherical signals, a Spherical Feature Pyramid Network is constructed based on a novel design of SPAtial Spherical Convolution.
Experiments on the benchmarking datasets confirm the efficacy of our method, which outperforms the existing ones with a large margin in the regime of high precision.
arXiv Detail & Related papers (2023-08-19T05:47:53Z) - Rotation-Invariant Transformer for Point Cloud Matching [42.5714375149213]
We introduce RoITr, a Rotation-Invariant Transformer to cope with the pose variations in the point cloud matching task.
We propose a global transformer with rotation-invariant cross-frame spatial awareness learned by the self-attention mechanism.
RoITr surpasses the existing methods by at least 13 and 5 percentage points in terms of Inlier Ratio and Registration Recall.
arXiv Detail & Related papers (2023-03-14T20:55:27Z) - CRIN: Rotation-Invariant Point Cloud Analysis and Rotation Estimation
via Centrifugal Reference Frame [60.24797081117877]
We propose the CRIN, namely Centrifugal Rotation-Invariant Network.
CRIN directly takes the coordinates of points as input and transforms local points into rotation-invariant representations.
A continuous distribution for 3D rotations based on points is introduced.
arXiv Detail & Related papers (2023-03-06T13:14:10Z) - RAUM-VO: Rotational Adjusted Unsupervised Monocular Visual Odometry [0.0]
We present RAUM-VO, an approach based on a model-free epipolar constraint for frame-to-frame motion estimation.
RAUM-VO shows a considerable accuracy improvement compared to other unsupervised pose networks on the KITTI dataset.
arXiv Detail & Related papers (2022-03-14T15:03:24Z) - On the Robustness of Multi-View Rotation Averaging [77.09542018140823]
We introduce the $epsilon$-cycle consistency term into the solver.
We implicitly constrain the negative effect of erroneous measurements by weight reducing.
Experiment results demonstrate that our proposed approach outperforms state of the arts on various benchmarks.
arXiv Detail & Related papers (2021-02-09T05:47:37Z) - RSINet: Rotation-Scale Invariant Network for Online Visual Tracking [7.186849714896344]
Most network-based trackers perform the tracking process without model update, and cannot learn targetspecific variation adaptively.
In this paper, we propose a novel Rotation-Scale Invariant Network (RSINet) to address the above problem.
Our RSINet tracker consists of a target-distractor discrimination branch and a rotation-scale estimation branch, the rotation and scale knowledge can be explicitly learned by a multi-task learning method in an end-to-end manner.
In addtion, the tracking model is adaptively optimized and updated undertemporal energy control, which ensures model stability and reliability, as well as high tracking
arXiv Detail & Related papers (2020-11-18T08:19:14Z) - Attention and Encoder-Decoder based models for transforming articulatory
movements at different speaking rates [60.02121449986413]
We propose an encoder-decoder architecture using LSTMs which generates smoother predicted articulatory trajectories.
We analyze amplitude of the transformed articulatory movements at different rates compared to their original counterparts.
We observe that AstNet could model both duration and extent of articulatory movements better than the existing transformation techniques.
arXiv Detail & Related papers (2020-06-04T19:33:26Z) - A Rotation-Invariant Framework for Deep Point Cloud Analysis [132.91915346157018]
We introduce a new low-level purely rotation-invariant representation to replace common 3D Cartesian coordinates as the network inputs.
Also, we present a network architecture to embed these representations into features, encoding local relations between points and their neighbors, and the global shape structure.
We evaluate our method on multiple point cloud analysis tasks, including shape classification, part segmentation, and shape retrieval.
arXiv Detail & Related papers (2020-03-16T14:04:45Z) - Quaternion Equivariant Capsule Networks for 3D Point Clouds [58.566467950463306]
We present a 3D capsule module for processing point clouds that is equivariant to 3D rotations and translations.
We connect dynamic routing between capsules to the well-known Weiszfeld algorithm.
Based on our operator, we build a capsule network that disentangles geometry from pose.
arXiv Detail & Related papers (2019-12-27T13:51:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.