MeViS: A Large-scale Benchmark for Video Segmentation with Motion
Expressions
- URL: http://arxiv.org/abs/2308.08544v1
- Date: Wed, 16 Aug 2023 17:58:34 GMT
- Title: MeViS: A Large-scale Benchmark for Video Segmentation with Motion
Expressions
- Authors: Henghui Ding, Chang Liu, Shuting He, Xudong Jiang, Chen Change Loy
- Abstract summary: We propose a large-scale dataset called MeViS, which contains numerous motion expressions to indicate target objects in complex environments.
The goal of our benchmark is to provide a platform that enables the development of effective language-guided video segmentation algorithms.
- Score: 93.35942025232943
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper strives for motion expressions guided video segmentation, which
focuses on segmenting objects in video content based on a sentence describing
the motion of the objects. Existing referring video object datasets typically
focus on salient objects and use language expressions that contain excessive
static attributes that could potentially enable the target object to be
identified in a single frame. These datasets downplay the importance of motion
in video content for language-guided video object segmentation. To investigate
the feasibility of using motion expressions to ground and segment objects in
videos, we propose a large-scale dataset called MeViS, which contains numerous
motion expressions to indicate target objects in complex environments. We
benchmarked 5 existing referring video object segmentation (RVOS) methods and
conducted a comprehensive comparison on the MeViS dataset. The results show
that current RVOS methods cannot effectively address motion expression-guided
video segmentation. We further analyze the challenges and propose a baseline
approach for the proposed MeViS dataset. The goal of our benchmark is to
provide a platform that enables the development of effective language-guided
video segmentation algorithms that leverage motion expressions as a primary cue
for object segmentation in complex video scenes. The proposed MeViS dataset has
been released at https://henghuiding.github.io/MeViS.
Related papers
- ViLLa: Video Reasoning Segmentation with Large Language Model [48.75470418596875]
We propose a new video segmentation task - video reasoning segmentation.
The task is designed to output tracklets of segmentation masks given a complex input text query.
We present ViLLa: Video reasoning segmentation with a Large Language Model.
arXiv Detail & Related papers (2024-07-18T17:59:17Z) - Submodular video object proposal selection for semantic object segmentation [1.223779595809275]
We learn a data-driven representation which captures the subset of multiple instances from continuous frames.
This selection process is formulated as a facility location problem solved by maximising a submodular function.
Our method retrieves the longer term contextual dependencies which underpins a robust semantic video object segmentation algorithm.
arXiv Detail & Related papers (2024-07-08T13:18:49Z) - 2nd Place Solution for MeViS Track in CVPR 2024 PVUW Workshop: Motion Expression guided Video Segmentation [8.20168024462357]
Motion Expression guided Video is a challenging task that aims at segmenting objects in the video based on natural language expressions with motion descriptions.
We introduce mask information obtained from the video instance segmentation model as preliminary information for temporal enhancement and employ SAM for spatial refinement.
Our method achieved a score of 49.92 J &F in the validation phase and 54.20 J &F in the test phase, securing the final ranking of 2nd in the MeViS Track at the CVPR 2024 PVUW Challenge.
arXiv Detail & Related papers (2024-06-20T02:16:23Z) - Training-Free Robust Interactive Video Object Segmentation [82.05906654403684]
We propose a training-free prompt tracking framework for interactive video object segmentation (I-PT)
We jointly adopt sparse points and boxes tracking, filtering out unstable points and capturing object-wise information.
Our framework has demonstrated robust zero-shot video segmentation results on popular VOS datasets.
arXiv Detail & Related papers (2024-06-08T14:25:57Z) - 1st Place Solution for MOSE Track in CVPR 2024 PVUW Workshop: Complex Video Object Segmentation [72.54357831350762]
We propose a semantic embedding video object segmentation model and use the salient features of objects as query representations.
We trained our model on a large-scale video object segmentation dataset.
Our model achieves first place (textbf84.45%) in the test set of Complex Video Object Challenge.
arXiv Detail & Related papers (2024-06-07T03:13:46Z) - Appearance-Based Refinement for Object-Centric Motion Segmentation [85.2426540999329]
We introduce an appearance-based refinement method that leverages temporal consistency in video streams to correct inaccurate flow-based proposals.
Our approach involves a sequence-level selection mechanism that identifies accurate flow-predicted masks as exemplars.
Its performance is evaluated on multiple video segmentation benchmarks, including DAVIS, YouTube, SegTrackv2, and FBMS-59.
arXiv Detail & Related papers (2023-12-18T18:59:51Z) - InstMove: Instance Motion for Object-centric Video Segmentation [70.16915119724757]
In this work, we study the instance-level motion and present InstMove, which stands for Instance Motion for Object-centric Video.
In comparison to pixel-wise motion, InstMove mainly relies on instance-level motion information that is free from image feature embeddings.
With only a few lines of code, InstMove can be integrated into current SOTA methods for three different video segmentation tasks.
arXiv Detail & Related papers (2023-03-14T17:58:44Z) - The Second Place Solution for The 4th Large-scale Video Object
Segmentation Challenge--Track 3: Referring Video Object Segmentation [18.630453674396534]
ReferFormer aims to segment object instances in a given video referred by a language expression in all video frames.
This work proposes several tricks to boost further, including cyclical learning rates, semi-supervised approach, and test-time augmentation inference.
The improved ReferFormer ranks 2nd place on CVPR2022 Referring Youtube-VOS Challenge.
arXiv Detail & Related papers (2022-06-24T02:15:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.