Related papers: CapsFlow: Optical Flow Estimation with Capsule Networks

CapsFlow: Optical Flow Estimation with Capsule Networks

URL: http://arxiv.org/abs/2304.00306v2
Date: Sat, 2 Dec 2023 01:43:05 GMT
Title: CapsFlow: Optical Flow Estimation with Capsule Networks
Authors: Rahul Chand, Rajat Arora, K Ram Prabhakar, R Venkatesh Babu
Abstract summary: Capsules are specialized to model seperate entities and their pose as a continuous matrix. We show that a simpler linear operation over poses of the objects detected by the capsules in a dataset is enough to model flow. We show reslts on a small toy where we outperform FlowNet and PWC-Net models.
Score: 25.17460345300064
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present a framework to use recently introduced Capsule Networks for solving the problem of Optical Flow, one of the fundamental computer vision tasks. Most of the existing state of the art deep architectures either uses a correlation oepration to match features from them. While correlation layer is sensitive to the choice of hyperparameters and does not put a prior on the underlying structure of the object, spatio temporal features will be limited by the network's receptive field. Also, we as humans look at moving objects as whole, something which cannot be encoded by correlation or spatio temporal features. Capsules, on the other hand, are specialized to model seperate entities and their pose as a continuous matrix. Thus, we show that a simpler linear operation over poses of the objects detected by the capsules in enough to model flow. We show reslts on a small toy dataset where we outperform FlowNetC and PWC-Net models.

Related papers

RPCANet++: Deep Interpretable Robust PCA for Sparse Object Segmentation [51.37553739930992]
RPCANet++ is a sparse object segmentation framework that fuses the interpretability of RPCA with efficient deep architectures.<n>Our approach unfolds a relaxed RPCA model into a structured network comprising a Background Approximation Module (BAM), an Object Extraction Module (OEM) and an Image Restoration Module (IRM)<n>Experiments on diverse datasets demonstrate that RPCANet++ achieves state-of-the-art performance under various imaging scenarios.
arXiv Detail & Related papers (2025-08-06T08:19:37Z)
Non-Separable Multi-Dimensional Network Flows for Visual Computing [62.50191141358778]
We propose a novel formalism for non-separable multi-dimensional network flows. Since the flow is defined on a per-dimension basis, the maximizing flow automatically chooses the best matching feature dimensions. As a proof of concept, we apply our formalism to the multi-object tracking problem and demonstrate that our approach outperforms scalar formulations on the MOT16 benchmark in terms of robustness to noise.
arXiv Detail & Related papers (2023-05-15T13:21:44Z)
Capsules as viewpoint learners for human pose estimation [4.246061945756033]
We show how most neural networks are not able to generalize well when the camera is subject to significant viewpoint changes. We propose a novel end-to-end viewpoint-equivariant capsule autoencoder that employs a fast Variational Bayes routing and matrix capsules. We achieve state-of-the-art results for multiple tasks and datasets while retaining other desirable properties.
arXiv Detail & Related papers (2023-02-13T09:01:46Z)
Affordance detection with Dynamic-Tree Capsule Networks [5.847547503155588]
Affordance detection from visual input is a fundamental step in autonomous robotic manipulation. We introduce the first affordance detection network based on dynamic tree-structured capsules for sparse 3D point clouds. Our algorithm is superior to current affordance detection methods when faced with grasping previously unseen objects.
arXiv Detail & Related papers (2022-11-09T21:14:08Z)
Dynamic Graph Message Passing Networks for Visual Recognition [112.49513303433606]
Modelling long-range dependencies is critical for scene understanding tasks in computer vision. A fully-connected graph is beneficial for such modelling, but its computational overhead is prohibitive. We propose a dynamic graph message passing network, that significantly reduces the computational complexity.
arXiv Detail & Related papers (2022-09-20T14:41:37Z)
Correlation-Aware Deep Tracking [83.51092789908677]
We propose a novel target-dependent feature network inspired by the self-/cross-attention scheme. Our network deeply embeds cross-image feature correlation in multiple layers of the feature network. Our model can be flexibly pre-trained on abundant unpaired images, leading to notably faster convergence than the existing methods.
arXiv Detail & Related papers (2022-03-03T11:53:54Z)
Temporal and Object Quantification Networks [95.64650820186706]
We present a new class of neuro-symbolic networks with a structural bias that enables them to learn to recognize complex relational-temporal events. We demonstrate that TOQ-Nets can generalize from small amounts of data to scenarios containing more objects than were present during training and to temporal warpings of input sequences.
arXiv Detail & Related papers (2021-06-10T16:18:21Z)
Deformable Capsules for Object Detection [3.702343116848637]
We introduce a new family of capsule networks, deformable capsules (textitDeformCaps), to address a very important problem in computer vision: object detection. We demonstrate that the proposed methods efficiently scale up to create the first-ever capsule network for object detection in the literature.
arXiv Detail & Related papers (2021-04-11T15:36:30Z)
Exploiting latent representation of sparse semantic layers for improved short-term motion prediction with Capsule Networks [0.12183405753834559]
This paper explores use of Capsule Networks (CapsNets) in the context of learning a hierarchical representation of sparse semantic layers corresponding to small regions of the High-Definition (HD) map. By using an architecture based on CapsNets the model is able to retain hierarchical relationships between detected features within images whilst also preventing loss of spatial data often caused by the pooling operation. We show that our model achieves significant improvement over recently published works on prediction, whilst drastically reducing the overall size of the network.
arXiv Detail & Related papers (2021-03-02T11:13:43Z)
A Point-Cloud Deep Learning Framework for Prediction of Fluid Flow Fields on Irregular Geometries [62.28265459308354]
Network learns end-to-end mapping between spatial positions and CFD quantities. Incompress laminar steady flow past a cylinder with various shapes for its cross section is considered. Network predicts the flow fields hundreds of times faster than our conventional CFD.
arXiv Detail & Related papers (2020-10-15T12:15:02Z)
Feature Flow: In-network Feature Flow Estimation for Video Object Detection [56.80974623192569]
Optical flow is widely used in computer vision tasks to provide pixel-level motion information. A common approach is to:forward optical flow to a neural network and fine-tune this network on the task dataset. We propose a novel network (IFF-Net) with an textbfIn-network textbfFeature textbfFlow estimation module for video object detection.
arXiv Detail & Related papers (2020-09-21T07:55:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.