ESS: Learning Event-based Semantic Segmentation from Still Images
- URL: http://arxiv.org/abs/2203.10016v1
- Date: Fri, 18 Mar 2022 15:30:01 GMT
- Title: ESS: Learning Event-based Semantic Segmentation from Still Images
- Authors: Zhaoning Sun, Nico Messikommer, Daniel Gehrig, Davide Scaramuzza
- Abstract summary: Event-based semantic segmentation is still in its infancy due to the novelty of the sensor and the lack of high-quality, labeled datasets.
We introduce ESS, which transfers the semantic segmentation task from existing labeled image datasets to unlabeled events via unsupervised domain adaptation (UDA)
To spur further research in event-based semantic segmentation, we introduce DSEC-Semantic, the first large-scale event-based dataset with fine-grained labels.
- Score: 48.37422967330683
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Retrieving accurate semantic information in challenging high dynamic range
(HDR) and high-speed conditions remains an open challenge for image-based
algorithms due to severe image degradations. Event cameras promise to address
these challenges since they feature a much higher dynamic range and are
resilient to motion blur. Nonetheless, semantic segmentation with event cameras
is still in its infancy which is chiefly due to the novelty of the sensor, and
the lack of high-quality, labeled datasets. In this work, we introduce ESS,
which tackles this problem by directly transferring the semantic segmentation
task from existing labeled image datasets to unlabeled events via unsupervised
domain adaptation (UDA). Compared to existing UDA methods, our approach aligns
recurrent, motion-invariant event embeddings with image embeddings. For this
reason, our method neither requires video data nor per-pixel alignment between
images and events and, crucially, does not need to hallucinate motion from
still images. Additionally, to spur further research in event-based semantic
segmentation, we introduce DSEC-Semantic, the first large-scale event-based
dataset with fine-grained labels. We show that using image labels alone, ESS
outperforms existing UDA approaches, and when combined with event labels, it
even outperforms state-of-the-art supervised approaches on both DDD17 and
DSEC-Semantic. Finally, ESS is general-purpose, which unlocks the vast amount
of existing labeled image datasets and paves the way for new and exciting
research directions in new fields previously inaccessible for event cameras.
Related papers
- OVOSE: Open-Vocabulary Semantic Segmentation in Event-Based Cameras [18.07403094754705]
We introduce OVOSE, the first Open-Vocabulary Semantic algorithm for Event cameras.
We evaluate OVOSE on two driving semantic segmentation datasets DDD17, and DSEC-Semantic.
OVOSE demonstrates superior performance, showcasing its potential for real-world applications.
arXiv Detail & Related papers (2024-08-18T09:56:32Z) - OpenESS: Event-based Semantic Scene Understanding with Open Vocabularies [4.940059438666211]
Event-based semantic segmentation (ESS) is a fundamental yet challenging task for event camera sensing.
We synergize information from image, text, and event-data domains and introduce OpenESS to enable scalable ESS.
We achieve 53.93% and 43.31% mIoU on DDD17 and DSEC-Semantic benchmarks without using either event or frame labels.
arXiv Detail & Related papers (2024-05-08T17:59:58Z) - HPL-ESS: Hybrid Pseudo-Labeling for Unsupervised Event-based Semantic Segmentation [47.271784693700845]
We propose a novel hybrid pseudo-labeling framework for unsupervised event-based semantic segmentation, HPL-ESS, to alleviate the influence of noisy pseudo labels.
Our proposed method outperforms existing state-of-the-art methods by a large margin on the DSEC-Semantic dataset.
arXiv Detail & Related papers (2024-03-25T14:02:33Z) - Un-EvMoSeg: Unsupervised Event-based Independent Motion Segmentation [33.21922177483246]
Event cameras are a novel type of biologically inspired vision sensor known for their high temporal resolution, high dynamic range, and low power consumption.
We propose the first event framework that generates IMO pseudo-labels using geometric constraints.
Due to its unsupervised nature, our method can handle an arbitrary number of not predetermined objects and is easily scalable to datasets where expensive IMO labels are not readily available.
arXiv Detail & Related papers (2023-11-30T18:59:32Z) - EvDistill: Asynchronous Events to End-task Learning via Bidirectional
Reconstruction-guided Cross-modal Knowledge Distillation [61.33010904301476]
Event cameras sense per-pixel intensity changes and produce asynchronous event streams with high dynamic range and less motion blur.
We propose a novel approach, called bfEvDistill, to learn a student network on the unlabeled and unpaired event data.
We show that EvDistill achieves significantly better results than the prior works and KD with only events and APS frames.
arXiv Detail & Related papers (2021-11-24T08:48:16Z) - Bridging the Gap between Events and Frames through Unsupervised Domain
Adaptation [57.22705137545853]
We propose a task transfer method that allows models to be trained directly with labeled images and unlabeled event data.
We leverage the generative event model to split event features into content and motion features.
Our approach unlocks the vast amount of existing image datasets for the training of event-based neural networks.
arXiv Detail & Related papers (2021-09-06T17:31:37Z) - Semi-supervised Semantic Segmentation with Directional Context-aware
Consistency [66.49995436833667]
We focus on the semi-supervised segmentation problem where only a small set of labeled data is provided with a much larger collection of totally unlabeled images.
A preferred high-level representation should capture the contextual information while not losing self-awareness.
We present the Directional Contrastive Loss (DC Loss) to accomplish the consistency in a pixel-to-pixel manner.
arXiv Detail & Related papers (2021-06-27T03:42:40Z) - RGB-based Semantic Segmentation Using Self-Supervised Depth Pre-Training [77.62171090230986]
We propose an easily scalable and self-supervised technique that can be used to pre-train any semantic RGB segmentation method.
In particular, our pre-training approach makes use of automatically generated labels that can be obtained using depth sensors.
We show how our proposed self-supervised pre-training with HN-labels can be used to replace ImageNet pre-training.
arXiv Detail & Related papers (2020-02-06T11:16:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.