CapsFlow: Optical Flow Estimation with Capsule Networks
- URL: http://arxiv.org/abs/2304.00306v2
- Date: Sat, 2 Dec 2023 01:43:05 GMT
- Title: CapsFlow: Optical Flow Estimation with Capsule Networks
- Authors: Rahul Chand, Rajat Arora, K Ram Prabhakar, R Venkatesh Babu
- Abstract summary: Capsules are specialized to model seperate entities and their pose as a continuous matrix.
We show that a simpler linear operation over poses of the objects detected by the capsules in a dataset is enough to model flow.
We show reslts on a small toy where we outperform FlowNet and PWC-Net models.
- Score: 25.17460345300064
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a framework to use recently introduced Capsule Networks for
solving the problem of Optical Flow, one of the fundamental computer vision
tasks. Most of the existing state of the art deep architectures either uses a
correlation oepration to match features from them. While correlation layer is
sensitive to the choice of hyperparameters and does not put a prior on the
underlying structure of the object, spatio temporal features will be limited by
the network's receptive field. Also, we as humans look at moving objects as
whole, something which cannot be encoded by correlation or spatio temporal
features. Capsules, on the other hand, are specialized to model seperate
entities and their pose as a continuous matrix. Thus, we show that a simpler
linear operation over poses of the objects detected by the capsules in enough
to model flow. We show reslts on a small toy dataset where we outperform
FlowNetC and PWC-Net models.
Related papers
- Non-Separable Multi-Dimensional Network Flows for Visual Computing [62.50191141358778]
We propose a novel formalism for non-separable multi-dimensional network flows.
Since the flow is defined on a per-dimension basis, the maximizing flow automatically chooses the best matching feature dimensions.
As a proof of concept, we apply our formalism to the multi-object tracking problem and demonstrate that our approach outperforms scalar formulations on the MOT16 benchmark in terms of robustness to noise.
arXiv Detail & Related papers (2023-05-15T13:21:44Z) - Capsules as viewpoint learners for human pose estimation [4.246061945756033]
We show how most neural networks are not able to generalize well when the camera is subject to significant viewpoint changes.
We propose a novel end-to-end viewpoint-equivariant capsule autoencoder that employs a fast Variational Bayes routing and matrix capsules.
We achieve state-of-the-art results for multiple tasks and datasets while retaining other desirable properties.
arXiv Detail & Related papers (2023-02-13T09:01:46Z) - Affordance detection with Dynamic-Tree Capsule Networks [5.847547503155588]
Affordance detection from visual input is a fundamental step in autonomous robotic manipulation.
We introduce the first affordance detection network based on dynamic tree-structured capsules for sparse 3D point clouds.
Our algorithm is superior to current affordance detection methods when faced with grasping previously unseen objects.
arXiv Detail & Related papers (2022-11-09T21:14:08Z) - Dynamic Graph Message Passing Networks for Visual Recognition [112.49513303433606]
Modelling long-range dependencies is critical for scene understanding tasks in computer vision.
A fully-connected graph is beneficial for such modelling, but its computational overhead is prohibitive.
We propose a dynamic graph message passing network, that significantly reduces the computational complexity.
arXiv Detail & Related papers (2022-09-20T14:41:37Z) - Correlation-Aware Deep Tracking [83.51092789908677]
We propose a novel target-dependent feature network inspired by the self-/cross-attention scheme.
Our network deeply embeds cross-image feature correlation in multiple layers of the feature network.
Our model can be flexibly pre-trained on abundant unpaired images, leading to notably faster convergence than the existing methods.
arXiv Detail & Related papers (2022-03-03T11:53:54Z) - Temporal and Object Quantification Networks [95.64650820186706]
We present a new class of neuro-symbolic networks with a structural bias that enables them to learn to recognize complex relational-temporal events.
We demonstrate that TOQ-Nets can generalize from small amounts of data to scenarios containing more objects than were present during training and to temporal warpings of input sequences.
arXiv Detail & Related papers (2021-06-10T16:18:21Z) - Deformable Capsules for Object Detection [5.819237403145079]
We introduce deformable capsules (DeformCaps), a new capsule structure (SplitCaps), and a novel dynamic routing algorithm (SE-Routing) to balance computational efficiency with the need for modeling a large number of objects and classes.
Our proposed architecture is a one-stage detection framework and obtains results on MS COCO which are on-par with state-of-the-art one-stage CNN-based methods.
arXiv Detail & Related papers (2021-04-11T15:36:30Z) - Exploiting latent representation of sparse semantic layers for improved
short-term motion prediction with Capsule Networks [0.12183405753834559]
This paper explores use of Capsule Networks (CapsNets) in the context of learning a hierarchical representation of sparse semantic layers corresponding to small regions of the High-Definition (HD) map.
By using an architecture based on CapsNets the model is able to retain hierarchical relationships between detected features within images whilst also preventing loss of spatial data often caused by the pooling operation.
We show that our model achieves significant improvement over recently published works on prediction, whilst drastically reducing the overall size of the network.
arXiv Detail & Related papers (2021-03-02T11:13:43Z) - A Point-Cloud Deep Learning Framework for Prediction of Fluid Flow
Fields on Irregular Geometries [62.28265459308354]
Network learns end-to-end mapping between spatial positions and CFD quantities.
Incompress laminar steady flow past a cylinder with various shapes for its cross section is considered.
Network predicts the flow fields hundreds of times faster than our conventional CFD.
arXiv Detail & Related papers (2020-10-15T12:15:02Z) - Feature Flow: In-network Feature Flow Estimation for Video Object
Detection [56.80974623192569]
Optical flow is widely used in computer vision tasks to provide pixel-level motion information.
A common approach is to:forward optical flow to a neural network and fine-tune this network on the task dataset.
We propose a novel network (IFF-Net) with an textbfIn-network textbfFeature textbfFlow estimation module for video object detection.
arXiv Detail & Related papers (2020-09-21T07:55:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.