InstMove: Instance Motion for Object-centric Video Segmentation
- URL: http://arxiv.org/abs/2303.08132v2
- Date: Thu, 30 Mar 2023 04:23:32 GMT
- Title: InstMove: Instance Motion for Object-centric Video Segmentation
- Authors: Qihao Liu, Junfeng Wu, Yi Jiang, Xiang Bai, Alan Yuille, Song Bai
- Abstract summary: In this work, we study the instance-level motion and present InstMove, which stands for Instance Motion for Object-centric Video.
In comparison to pixel-wise motion, InstMove mainly relies on instance-level motion information that is free from image feature embeddings.
With only a few lines of code, InstMove can be integrated into current SOTA methods for three different video segmentation tasks.
- Score: 70.16915119724757
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite significant efforts, cutting-edge video segmentation methods still
remain sensitive to occlusion and rapid movement, due to their reliance on the
appearance of objects in the form of object embeddings, which are vulnerable to
these disturbances. A common solution is to use optical flow to provide motion
information, but essentially it only considers pixel-level motion, which still
relies on appearance similarity and hence is often inaccurate under occlusion
and fast movement. In this work, we study the instance-level motion and present
InstMove, which stands for Instance Motion for Object-centric Video
Segmentation. In comparison to pixel-wise motion, InstMove mainly relies on
instance-level motion information that is free from image feature embeddings,
and features physical interpretations, making it more accurate and robust
toward occlusion and fast-moving objects. To better fit in with the video
segmentation tasks, InstMove uses instance masks to model the physical presence
of an object and learns the dynamic model through a memory network to predict
its position and shape in the next frame. With only a few lines of code,
InstMove can be integrated into current SOTA methods for three different video
segmentation tasks and boost their performance. Specifically, we improve the
previous arts by 1.5 AP on OVIS dataset, which features heavy occlusions, and
4.9 AP on YouTubeVIS-Long dataset, which mainly contains fast-moving objects.
These results suggest that instance-level motion is robust and accurate, and
hence serving as a powerful solution in complex scenarios for object-centric
video segmentation.
Related papers
- MeViS: A Large-scale Benchmark for Video Segmentation with Motion
Expressions [93.35942025232943]
We propose a large-scale dataset called MeViS, which contains numerous motion expressions to indicate target objects in complex environments.
The goal of our benchmark is to provide a platform that enables the development of effective language-guided video segmentation algorithms.
arXiv Detail & Related papers (2023-08-16T17:58:34Z) - NeuralDiff: Segmenting 3D objects that move in egocentric videos [92.95176458079047]
We study the problem of decomposing the observed 3D scene into a static background and a dynamic foreground.
This task is reminiscent of the classic background subtraction problem, but is significantly harder because all parts of the scene, static and dynamic, generate a large apparent motion.
In particular, we consider egocentric videos and further separate the dynamic component into objects and the actor that observes and moves them.
arXiv Detail & Related papers (2021-10-19T12:51:35Z) - Event-based Motion Segmentation with Spatio-Temporal Graph Cuts [51.17064599766138]
We have developed a method to identify independently objects acquired with an event-based camera.
The method performs on par or better than the state of the art without having to predetermine the number of expected moving objects.
arXiv Detail & Related papers (2020-12-16T04:06:02Z) - Betrayed by Motion: Camouflaged Object Discovery via Motion Segmentation [93.22300146395536]
We design a computational architecture that discovers camouflaged objects in videos, specifically by exploiting motion information to perform object segmentation.
We collect the first large-scale Moving Camouflaged Animals (MoCA) video dataset, which consists of over 140 clips across a diverse range of animals.
We demonstrate the effectiveness of the proposed model on MoCA, and achieve competitive performance on the unsupervised segmentation protocol on DAVIS2016 by only relying on motion.
arXiv Detail & Related papers (2020-11-23T18:59:08Z) - Self-supervised Sparse to Dense Motion Segmentation [13.888344214818737]
We propose a self supervised method to learn the densification of sparse motion segmentations from single video frames.
We evaluate our method on the well-known motion segmentation datasets FBMS59 and DAVIS16.
arXiv Detail & Related papers (2020-08-18T11:40:18Z) - DyStaB: Unsupervised Object Segmentation via Dynamic-Static
Bootstrapping [72.84991726271024]
We describe an unsupervised method to detect and segment portions of images of live scenes that are seen moving as a coherent whole.
Our method first partitions the motion field by minimizing the mutual information between segments.
It uses the segments to learn object models that can be used for detection in a static image.
arXiv Detail & Related papers (2020-08-16T22:05:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.