Related papers: MaskFlow: Object-Aware Motion Estimation

MaskFlow: Object-Aware Motion Estimation

URL: http://arxiv.org/abs/2311.12476v1
Date: Tue, 21 Nov 2023 09:37:49 GMT
Title: MaskFlow: Object-Aware Motion Estimation
Authors: Aria Ahmadi, David R. Walton, Tim Atherton, Cagatay Dikici
Abstract summary: We introduce a novel motion estimation method, MaskFlow, that is capable of estimating accurate motion fields. In addition to lower-level features, that are used in other Deep Neural Network (DNN)-based motion estimation methods, MaskFlow draws from object-level features and segmentations.
Score: 0.45646200630189254
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: We introduce a novel motion estimation method, MaskFlow, that is capable of estimating accurate motion fields, even in very challenging cases with small objects, large displacements and drastic appearance changes. In addition to lower-level features, that are used in other Deep Neural Network (DNN)-based motion estimation methods, MaskFlow draws from object-level features and segmentations. These features and segmentations are used to approximate the objects' translation motion field. We propose a novel and effective way of incorporating the incomplete translation motion field into a subsequent motion estimation network for refinement and completion. We also produced a new challenging synthetic dataset with motion field ground truth, and also provide extra ground truth for the object-instance matchings and corresponding segmentation masks. We demonstrate that MaskFlow outperforms state of the art methods when evaluated on our new challenging dataset, whilst still producing comparable results on the popular FlyingThings3D benchmark dataset.

Related papers

Segment Any Motion in Videos [80.72424676419755]
We propose a novel approach for moving object segmentation that combines long-range trajectory motion cues with DINO-based semantic features. Our model employs Spatio-Temporal Trajectory Attention and Motion-Semantic Decoupled Embedding to prioritize motion while integrating semantic support.
arXiv Detail & Related papers (2025-03-28T09:34:11Z)
Instance-Level Moving Object Segmentation from a Single Image with Events [84.12761042512452]
Moving object segmentation plays a crucial role in understanding dynamic scenes involving multiple moving objects. Previous methods encounter difficulties in distinguishing whether pixel displacements of an object are caused by camera motion or object motion. Recent advances exploit the motion sensitivity of novel event cameras to counter conventional images' inadequate motion modeling capabilities. We propose the first instance-level moving object segmentation framework that integrates complementary texture and motion cues.
arXiv Detail & Related papers (2025-02-18T15:56:46Z)
UnSAMFlow: Unsupervised Optical Flow Guided by Segment Anything Model [12.706915226843401]
UnSAMFlow is an unsupervised flow network that also leverages object information from the latest foundation model Segment Anything Model (SAM) We analyze the poor gradient landscapes of traditional smoothness losses and propose a new smoothness definition based on homography instead. Our method produces clear optical flow estimation with sharp boundaries around objects, which outperforms state-of-the-art methods on KITTI and Sintel datasets.
arXiv Detail & Related papers (2024-05-04T08:27:12Z)
Neuromorphic Vision-based Motion Segmentation with Graph Transformer Neural Network [4.386534439007928]
We propose a novel event-based motion segmentation algorithm using a Graph Transformer Neural Network, dubbed GTNN. Our proposed algorithm processes event streams as 3D graphs by a series nonlinear transformations to unveil local and global correlations between events. We show that GTNN outperforms state-of-the-art methods in the presence of dynamic background variations, motion patterns, and multiple dynamic objects with varying sizes and velocities.
arXiv Detail & Related papers (2024-04-16T22:44:29Z)
Appearance-Based Refinement for Object-Centric Motion Segmentation [85.2426540999329]
We introduce an appearance-based refinement method that leverages temporal consistency in video streams to correct inaccurate flow-based proposals. Our approach involves a sequence-level selection mechanism that identifies accurate flow-predicted masks as exemplars. Its performance is evaluated on multiple video segmentation benchmarks, including DAVIS, YouTube, SegTrackv2, and FBMS-59.
arXiv Detail & Related papers (2023-12-18T18:59:51Z)
Contrastive Lift: 3D Object Instance Segmentation by Slow-Fast Contrastive Fusion [110.84357383258818]
We propose a novel approach to lift 2D segments to 3D and fuse them by means of a neural field representation. The core of our approach is a slow-fast clustering objective function, which is scalable and well-suited for scenes with a large number of objects. Our approach outperforms the state-of-the-art on challenging scenes from the ScanNet, Hypersim, and Replica datasets.
arXiv Detail & Related papers (2023-06-07T17:57:45Z)
AnyFlow: Arbitrary Scale Optical Flow with Implicit Neural Representation [17.501820140334328]
We introduce AnyFlow, a robust network that estimates accurate flow from images of various resolutions. We establish a new state-of-the-art performance of cross-dataset generalization on the KITTI dataset.
arXiv Detail & Related papers (2023-03-29T07:03:51Z)
Self-Improving SLAM in Dynamic Environments: Learning When to Mask [5.4310785842119795]
We propose a novel SLAM that learns when masking objects improves its performance in dynamic scenarios. We do not make any priors on motion: our method learns to mask moving objects by itself. Our method reaches the state of the art on the TUM RGB-D dataset and outperforms it on KITTI and ConsInv datasets.
arXiv Detail & Related papers (2022-10-15T18:06:06Z)
Neural Motion Fields: Encoding Grasp Trajectories as Implicit Value Functions [65.84090965167535]
We present Neural Motion Fields, a novel object representation which encodes both object point clouds and the relative task trajectories as an implicit value function parameterized by a neural network. This object-centric representation models a continuous distribution over the SE(3) space and allows us to perform grasping reactively by leveraging sampling-based MPC to optimize this value function.
arXiv Detail & Related papers (2022-06-29T18:47:05Z)
ImpDet: Exploring Implicit Fields for 3D Object Detection [74.63774221984725]
We introduce a new perspective that views bounding box regression as an implicit function. This leads to our proposed framework, termed Implicit Detection or ImpDet. Our ImpDet assigns specific values to points in different local 3D spaces, thereby high-quality boundaries can be generated.
arXiv Detail & Related papers (2022-03-31T17:52:12Z)
DetFlowTrack: 3D Multi-object Tracking based on Simultaneous Optimization of Object Detection and Scene Flow Estimation [23.305159598648924]
We propose a 3D MOT framework based on simultaneous optimization of object detection and scene flow estimation. For more accurate scene flow label especially in the case of motion with rotation, a box-transformation-based scene flow ground truth calculation method is proposed. Experimental results on the KITTI MOT dataset show competitive results over the state-of-the-arts and the robustness under extreme motion with rotation.
arXiv Detail & Related papers (2022-03-04T07:06:47Z)
Learning to Segment Rigid Motions from Two Frames [72.14906744113125]
We propose a modular network, motivated by a geometric analysis of what independent object motions can be recovered from an egomotion field. It takes two consecutive frames as input and predicts segmentation masks for the background and multiple rigidly moving objects, which are then parameterized by 3D rigid transformations. Our method achieves state-of-the-art performance for rigid motion segmentation on KITTI and Sintel.
arXiv Detail & Related papers (2021-01-11T04:20:30Z)
DOT: Dynamic Object Tracking for Visual SLAM [83.69544718120167]
DOT combines instance segmentation and multi-view geometry to generate masks for dynamic objects. To determine which objects are actually moving, DOT segments first instances of potentially dynamic objects and then, with the estimated camera motion, tracks such objects by minimizing the photometric reprojection error. Our results show that our approach improves significantly the accuracy and robustness of ORB-SLAM 2, especially in highly dynamic scenes.
arXiv Detail & Related papers (2020-09-30T18:36:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.