Improving Unsupervised Video Object Segmentation with Motion-Appearance
Synergy
- URL: http://arxiv.org/abs/2212.08816v1
- Date: Sat, 17 Dec 2022 06:47:30 GMT
- Title: Improving Unsupervised Video Object Segmentation with Motion-Appearance
Synergy
- Authors: Long Lian, Zhirong Wu, Stella X. Yu
- Abstract summary: We present IMAS, a method that segments the primary objects in videos without manual annotation in training or inference.
IMAS achieves Improved UVOS with Motion-Appearance Synergy.
We demonstrate its effectiveness in tuning critical hyperparams previously tuned with human annotation or hand-crafted hyperparam-specific metrics.
- Score: 52.03068246508119
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present IMAS, a method that segments the primary objects in videos without
manual annotation in training or inference. Previous methods in unsupervised
video object segmentation (UVOS) have demonstrated the effectiveness of motion
as either input or supervision for segmentation. However, motion signals may be
uninformative or even misleading in cases such as deformable objects and
objects with reflections, causing unsatisfactory segmentation.
In contrast, IMAS achieves Improved UVOS with Motion-Appearance Synergy. Our
method has two training stages: 1) a motion-supervised object discovery stage
that deals with motion-appearance conflicts through a learnable residual
pathway; 2) a refinement stage with both low- and high-level appearance
supervision to correct model misconceptions learned from misleading motion
cues.
Additionally, we propose motion-semantic alignment as a model-agnostic
annotation-free hyperparam tuning method. We demonstrate its effectiveness in
tuning critical hyperparams previously tuned with human annotation or
hand-crafted hyperparam-specific metrics.
IMAS greatly improves the segmentation quality on several common UVOS
benchmarks. For example, we surpass previous methods by 8.3% on DAVIS16
benchmark with only standard ResNet and convolutional heads. We intend to
release our code for future research and applications.
Related papers
- MotionMix: Weakly-Supervised Diffusion for Controllable Motion
Generation [19.999239668765885]
MotionMix is a weakly-supervised diffusion model that leverages both noisy and unannotated motion sequences.
Our framework consistently achieves state-of-the-art performances on text-to-motion, action-to-motion, and music-to-dance tasks.
arXiv Detail & Related papers (2024-01-20T04:58:06Z) - Appearance-Based Refinement for Object-Centric Motion Segmentation [85.2426540999329]
We introduce an appearance-based refinement method that leverages temporal consistency in video streams to correct inaccurate flow-based proposals.
Our approach involves a sequence-level selection mechanism that identifies accurate flow-predicted masks as exemplars.
Its performance is evaluated on multiple video segmentation benchmarks, including DAVIS, YouTube, SegTrackv2, and FBMS-59.
arXiv Detail & Related papers (2023-12-18T18:59:51Z) - Event-Free Moving Object Segmentation from Moving Ego Vehicle [88.33470650615162]
Moving object segmentation (MOS) in dynamic scenes is an important, challenging, but under-explored research topic for autonomous driving.
Most segmentation methods leverage motion cues obtained from optical flow maps.
We propose to exploit event cameras for better video understanding, which provide rich motion cues without relying on optical flow.
arXiv Detail & Related papers (2023-04-28T23:43:10Z) - Tsanet: Temporal and Scale Alignment for Unsupervised Video Object
Segmentation [21.19216164433897]
Unsupervised Video Object (UVOS) refers to the challenging task of segmenting the prominent object in videos without manual guidance.
We propose a novel framework for UVOS that can address the aforementioned limitations of the two approaches.
We present experimental results on public benchmark datasets, DAVIS 2016 and FBMS, which demonstrate the effectiveness of our method.
arXiv Detail & Related papers (2023-03-08T04:59:43Z) - Segmenting Moving Objects via an Object-Centric Layered Representation [100.26138772664811]
We introduce an object-centric segmentation model with a depth-ordered layer representation.
We introduce a scalable pipeline for generating synthetic training data with multiple objects.
We evaluate the model on standard video segmentation benchmarks.
arXiv Detail & Related papers (2022-07-05T17:59:43Z) - Deep Motion Prior for Weakly-Supervised Temporal Action Localization [35.25323276744999]
Weakly-Supervised Temporal Action localization (WSTAL) aims to localize actions in untrimmed videos with only video-level labels.
Currently, most state-of-the-art WSTAL methods follow a Multi-Instance Learning (MIL) pipeline.
We argue that existing methods have overlooked two important drawbacks: 1) inadequate use of motion information and 2) the incompatibility of prevailing cross-entropy training loss.
arXiv Detail & Related papers (2021-08-12T08:51:36Z) - Self-supervised Video Object Segmentation by Motion Grouping [79.13206959575228]
We develop a computer vision system able to segment objects by exploiting motion cues.
We introduce a simple variant of the Transformer to segment optical flow frames into primary objects and the background.
We evaluate the proposed architecture on public benchmarks (DAVIS2016, SegTrackv2, and FBMS59)
arXiv Detail & Related papers (2021-04-15T17:59:32Z) - Track, Check, Repeat: An EM Approach to Unsupervised Tracking [20.19397660306534]
We propose an unsupervised method for detecting and tracking moving objects in 3D, in unlabelled RGB-D videos.
We learn an ensemble of appearance-based 2D and 3D detectors, under heavy data augmentation.
We compare against existing unsupervised object discovery and tracking methods, using challenging videos from CATER and KITTI.
arXiv Detail & Related papers (2021-04-07T22:51:39Z) - Learning to Segment Rigid Motions from Two Frames [72.14906744113125]
We propose a modular network, motivated by a geometric analysis of what independent object motions can be recovered from an egomotion field.
It takes two consecutive frames as input and predicts segmentation masks for the background and multiple rigidly moving objects, which are then parameterized by 3D rigid transformations.
Our method achieves state-of-the-art performance for rigid motion segmentation on KITTI and Sintel.
arXiv Detail & Related papers (2021-01-11T04:20:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.