Out of the Room: Generalizing Event-Based Dynamic Motion Segmentation
for Complex Scenes
- URL: http://arxiv.org/abs/2403.04562v1
- Date: Thu, 7 Mar 2024 14:59:34 GMT
- Title: Out of the Room: Generalizing Event-Based Dynamic Motion Segmentation
for Complex Scenes
- Authors: Stamatios Georgoulis, Weining Ren, Alfredo Bochicchio, Daniel Eckert,
Yuanyou Li, and Abel Gawel
- Abstract summary: Rapid and reliable identification of dynamic scene parts, also known as motion segmentation, is a key challenge for mobile sensors.
Event cameras have the potential to overcome these limitations, but corresponding methods have only been demonstrated in smaller-scale indoor environments.
This work presents an event-based method for class-agnostic motion segmentation that can successfully be deployed across complex large-scale outdoor environments too.
- Score: 10.936350433952668
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Rapid and reliable identification of dynamic scene parts, also known as
motion segmentation, is a key challenge for mobile sensors. Contemporary RGB
camera-based methods rely on modeling camera and scene properties however, are
often under-constrained and fall short in unknown categories. Event cameras
have the potential to overcome these limitations, but corresponding methods
have only been demonstrated in smaller-scale indoor environments with
simplified dynamic objects. This work presents an event-based method for
class-agnostic motion segmentation that can successfully be deployed across
complex large-scale outdoor environments too. To this end, we introduce a novel
divide-and-conquer pipeline that combines: (a) ego-motion compensated events,
computed via a scene understanding module that predicts monocular depth and
camera pose as auxiliary tasks, and (b) optical flow from a dedicated optical
flow module. These intermediate representations are then fed into a
segmentation module that predicts motion segmentation masks. A novel
transformer-based temporal attention module in the segmentation module builds
correlations across adjacent 'frames' to get temporally consistent segmentation
masks. Our method sets the new state-of-the-art on the classic EV-IMO benchmark
(indoors), where we achieve improvements of 2.19 moving object IoU (2.22 mIoU)
and 4.52 point IoU respectively, as well as on a newly-generated motion
segmentation and tracking benchmark (outdoors) based on the DSEC event dataset,
termed DSEC-MOTS, where we show improvement of 12.91 moving object IoU.
Related papers
- Motion Segmentation for Neuromorphic Aerial Surveillance [42.04157319642197]
Event cameras offer superior temporal resolution, superior dynamic range, and minimal power requirements.
Unlike traditional frame-based sensors that capture redundant information at fixed intervals, event cameras asynchronously record pixel-level brightness changes.
We introduce a novel motion segmentation method that leverages self-supervised vision transformers on both event data and optical flow information.
arXiv Detail & Related papers (2024-05-24T04:36:13Z) - Event-Free Moving Object Segmentation from Moving Ego Vehicle [88.33470650615162]
Moving object segmentation (MOS) in dynamic scenes is an important, challenging, but under-explored research topic for autonomous driving.
Most segmentation methods leverage motion cues obtained from optical flow maps.
We propose to exploit event cameras for better video understanding, which provide rich motion cues without relying on optical flow.
arXiv Detail & Related papers (2023-04-28T23:43:10Z) - Implicit Motion Handling for Video Camouflaged Object Detection [60.98467179649398]
We propose a new video camouflaged object detection (VCOD) framework.
It can exploit both short-term and long-term temporal consistency to detect camouflaged objects from video frames.
arXiv Detail & Related papers (2022-03-14T17:55:41Z) - EAN: Event Adaptive Network for Enhanced Action Recognition [66.81780707955852]
We propose a unified action recognition framework to investigate the dynamic nature of video content.
First, when extracting local cues, we generate the spatial-temporal kernels of dynamic-scale to adaptively fit the diverse events.
Second, to accurately aggregate these cues into a global video representation, we propose to mine the interactions only among a few selected foreground objects by a Transformer.
arXiv Detail & Related papers (2021-07-22T15:57:18Z) - Learning Comprehensive Motion Representation for Action Recognition [124.65403098534266]
2D CNN-based methods are efficient but may yield redundant features due to applying the same 2D convolution kernel to each frame.
Recent efforts attempt to capture motion information by establishing inter-frame connections while still suffering the limited temporal receptive field or high latency.
We propose a Channel-wise Motion Enhancement (CME) module to adaptively emphasize the channels related to dynamic information with a channel-wise gate vector.
We also propose a Spatial-wise Motion Enhancement (SME) module to focus on the regions with the critical target in motion, according to the point-to-point similarity between adjacent feature maps.
arXiv Detail & Related papers (2021-03-23T03:06:26Z) - Event-based Motion Segmentation with Spatio-Temporal Graph Cuts [51.17064599766138]
We have developed a method to identify independently objects acquired with an event-based camera.
The method performs on par or better than the state of the art without having to predetermine the number of expected moving objects.
arXiv Detail & Related papers (2020-12-16T04:06:02Z) - 0-MMS: Zero-Shot Multi-Motion Segmentation With A Monocular Event Camera [13.39518293550118]
We present an approach for monocular multi-motion segmentation, which combines bottom-up feature tracking and top-down motion compensation into a unified pipeline.
Using the events within a time-interval, our method segments the scene into multiple motions by splitting and merging.
The approach was successfully evaluated on both challenging real-world and synthetic scenarios from the EV-IMO, EED, and MOD datasets.
arXiv Detail & Related papers (2020-06-11T02:34:29Z) - Motion-Attentive Transition for Zero-Shot Video Object Segmentation [99.44383412488703]
We present a Motion-Attentive Transition Network (MATNet) for zero-shot object segmentation.
An asymmetric attention block, called Motion-Attentive Transition (MAT), is designed within a two-stream encoder.
In this way, the encoder becomes deeply internative, allowing for closely hierarchical interactions between object motion and appearance.
arXiv Detail & Related papers (2020-03-09T16:58:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.