Finding Meaning in Points: Weakly Supervised Semantic Segmentation for Event Cameras
- URL: http://arxiv.org/abs/2407.11216v1
- Date: Mon, 15 Jul 2024 20:00:50 GMT
- Title: Finding Meaning in Points: Weakly Supervised Semantic Segmentation for Event Cameras
- Authors: Hoonhee Cho, Sung-Hoon Yoon, Hyeokjun Kweon, Kuk-Jin Yoon,
- Abstract summary: We present EV-WSSS: a novel weakly supervised approach for event-based semantic segmentation.
The proposed framework performs asymmetric dual-student learning between 1) the original forward event data and 2) the longer reversed event data.
We show that the proposed method achieves substantial segmentation results even without relying on pixel-level dense ground truths.
- Score: 45.063747874243276
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Event cameras excel in capturing high-contrast scenes and dynamic objects, offering a significant advantage over traditional frame-based cameras. Despite active research into leveraging event cameras for semantic segmentation, generating pixel-wise dense semantic maps for such challenging scenarios remains labor-intensive. As a remedy, we present EV-WSSS: a novel weakly supervised approach for event-based semantic segmentation that utilizes sparse point annotations. To fully leverage the temporal characteristics of event data, the proposed framework performs asymmetric dual-student learning between 1) the original forward event data and 2) the longer reversed event data, which contain complementary information from the past and the future, respectively. Besides, to mitigate the challenges posed by sparse supervision, we propose feature-level contrastive learning based on class-wise prototypes, carefully aggregated at both spatial region and sample levels. Additionally, we further excavate the potential of our dual-student learning model by exchanging prototypes between the two learning paths, thereby harnessing their complementary strengths. With extensive experiments on various datasets, including DSEC Night-Point with sparse point annotations newly provided by this paper, the proposed method achieves substantial segmentation results even without relying on pixel-level dense ground truths. The code and dataset are available at https://github.com/Chohoonhee/EV-WSSS.
Related papers
- LaSe-E2V: Towards Language-guided Semantic-Aware Event-to-Video Reconstruction [8.163356555241322]
We propose a novel framework, called LaSe-E2V, that can achieve semantic-aware high-quality E2V reconstruction.
We first propose an Event-guided Spatiotemporal Attention (ESA) module to condition the event data to the denoising pipeline effectively.
We then introduce an event-aware mask loss to ensure temporal coherence and a noise strategy to enhance spatial consistency.
arXiv Detail & Related papers (2024-07-08T01:40:32Z) - Exploiting Object-based and Segmentation-based Semantic Features for Deep Learning-based Indoor Scene Classification [0.5572976467442564]
The work described in this paper uses both semantic information, obtained from object detection, and semantic segmentation techniques.
A novel approach that uses a semantic segmentation mask to provide Hu-moments-based segmentation categories' shape characterization, designated by Hu-Moments Features (SHMFs) is proposed.
A three-main-branch network, designated by GOS$2$F$2$App, that exploits deep-learning-based global features, object-based features, and semantic segmentation-based features is also proposed.
arXiv Detail & Related papers (2024-04-11T13:37:51Z) - Weakly Supervised Semantic Segmentation for Driving Scenes [27.0285166404621]
State-of-the-art techniques in weakly-supervised semantic segmentation (WSSS) exhibit severe performance degradation on driving scene datasets.
We develop a new WSSS framework tailored to driving scene datasets.
arXiv Detail & Related papers (2023-12-21T08:16:26Z) - Open-Vocabulary Camouflaged Object Segmentation [66.94945066779988]
We introduce a new task, open-vocabulary camouflaged object segmentation (OVCOS)
We construct a large-scale complex scene dataset (textbfOVCamo) containing 11,483 hand-selected images with fine annotations and corresponding object classes.
By integrating the guidance of class semantic knowledge and the supplement of visual structure cues from the edge and depth information, the proposed method can efficiently capture camouflaged objects.
arXiv Detail & Related papers (2023-11-19T06:00:39Z) - Semantics Meets Temporal Correspondence: Self-supervised Object-centric Learning in Videos [63.94040814459116]
Self-supervised methods have shown remarkable progress in learning high-level semantics and low-level temporal correspondence.
We propose a novel semantic-aware masked slot attention on top of the fused semantic features and correspondence maps.
We adopt semantic- and instance-level temporal consistency as self-supervision to encourage temporally coherent object-centric representations.
arXiv Detail & Related papers (2023-08-19T09:12:13Z) - Progressively Dual Prior Guided Few-shot Semantic Segmentation [57.37506990980975]
Few-shot semantic segmentation task aims at performing segmentation in query images with a few annotated support samples.
We propose a progressively dual prior guided few-shot semantic segmentation network.
arXiv Detail & Related papers (2022-11-20T16:19:47Z) - In-N-Out Generative Learning for Dense Unsupervised Video Segmentation [89.21483504654282]
In this paper, we focus on the unsupervised Video Object (VOS) task which learns visual correspondence from unlabeled videos.
We propose the In-aNd-Out (INO) generative learning from a purely generative perspective, which captures both high-level and fine-grained semantics.
Our INO outperforms previous state-of-the-art methods by significant margins.
arXiv Detail & Related papers (2022-03-29T07:56:21Z) - Sim2Real Object-Centric Keypoint Detection and Description [40.58367357980036]
Keypoint detection and description play a central role in computer vision.
We propose the object-centric formulation, which requires further identifying which object each interest point belongs to.
We develop a sim2real contrastive learning mechanism that can generalize the model trained in simulation to real-world applications.
arXiv Detail & Related papers (2022-02-01T15:00:20Z) - Superevents: Towards Native Semantic Segmentation for Event-based
Cameras [13.099264910430986]
Most successful computer vision models transform low-level features, such as Gabor filter responses, into richer representations of intermediate or mid-level complexity for downstream visual tasks.
We present a novel method that employs lifetime augmentation for obtaining an event stream representation that is fed to a fully convolutional network to extract superevents.
arXiv Detail & Related papers (2021-05-13T05:49:41Z) - Improving Point Cloud Semantic Segmentation by Learning 3D Object
Detection [102.62963605429508]
Point cloud semantic segmentation plays an essential role in autonomous driving.
Current 3D semantic segmentation networks focus on convolutional architectures that perform great for well represented classes.
We propose a novel Aware 3D Semantic Detection (DASS) framework that explicitly leverages localization features from an auxiliary 3D object detection task.
arXiv Detail & Related papers (2020-09-22T14:17:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.