Dual Transfer Learning for Event-based End-task Prediction via Pluggable
Event to Image Translation
- URL: http://arxiv.org/abs/2109.01801v1
- Date: Sat, 4 Sep 2021 06:49:09 GMT
- Title: Dual Transfer Learning for Event-based End-task Prediction via Pluggable
Event to Image Translation
- Authors: Lin Wang, Yujeong Chae, Kuk-Jin Yoon
- Abstract summary: Event cameras perceive per-pixel intensity changes and output asynchronous event streams with high dynamic range and less motion blur.
It has been shown that events alone can be used for end-task learning, eg, semantic segmentation, based on encoder-decoder-like networks.
We propose a simple yet flexible two-stream framework named Dual Transfer Learning (DTL) to effectively enhance the performance on the end-tasks.
- Score: 33.28163268182018
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Event cameras are novel sensors that perceive the per-pixel intensity changes
and output asynchronous event streams with high dynamic range and less motion
blur. It has been shown that events alone can be used for end-task learning,
\eg, semantic segmentation, based on encoder-decoder-like networks. However, as
events are sparse and mostly reflect edge information, it is difficult to
recover original details merely relying on the decoder. Moreover, most methods
resort to pixel-wise loss alone for supervision, which might be insufficient to
fully exploit the visual details from sparse events, thus leading to less
optimal performance. In this paper, we propose a simple yet flexible two-stream
framework named Dual Transfer Learning (DTL) to effectively enhance the
performance on the end-tasks without adding extra inference cost. The proposed
approach consists of three parts: event to end-task learning (EEL) branch,
event to image translation (EIT) branch, and transfer learning (TL) module that
simultaneously explores the feature-level affinity information and pixel-level
knowledge from the EIT branch to improve the EEL branch. This simple yet novel
method leads to strong representation learning from events and is evidenced by
the significant performance boost on the end-tasks such as semantic
segmentation and depth estimation.
Related papers
- EffiPerception: an Efficient Framework for Various Perception Tasks [6.1522068855729755]
EffiPerception is a framework to explore common learning patterns and increase the module.
It could achieve great accuracy robustness with relatively low memory cost under several perception tasks.
EffiPerception could show great accuracy-speed-memory overall performance increase within the four detection and segmentation tasks.
arXiv Detail & Related papers (2024-03-18T23:22:37Z) - USER: Unified Semantic Enhancement with Momentum Contrast for Image-Text
Retrieval [115.28586222748478]
Image-Text Retrieval (ITR) aims at searching for the target instances that are semantically relevant to the given query from the other modality.
Existing approaches typically suffer from two major limitations.
arXiv Detail & Related papers (2023-01-17T12:42:58Z) - HALSIE: Hybrid Approach to Learning Segmentation by Simultaneously
Exploiting Image and Event Modalities [6.543272301133159]
Event cameras detect changes in per-pixel intensity to generate asynchronous event streams.
They offer great potential for accurate semantic map retrieval in real-time autonomous systems.
Existing implementations for event segmentation suffer from sub-based performance.
We propose hybrid end-to-end learning framework HALSIE to reduce inference cost by up to $20times$ versus art.
arXiv Detail & Related papers (2022-11-19T17:09:50Z) - COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for
Cross-Modal Retrieval [59.15034487974549]
We propose a novel COllaborative Two-Stream vision-language pretraining model termed COTS for image-text retrieval.
Our COTS achieves the highest performance among all two-stream methods and comparable performance with 10,800X faster in inference.
Importantly, our COTS is also applicable to text-to-video retrieval, yielding new state-ofthe-art on the widely-used MSR-VTT dataset.
arXiv Detail & Related papers (2022-04-15T12:34:47Z) - In-N-Out Generative Learning for Dense Unsupervised Video Segmentation [89.21483504654282]
In this paper, we focus on the unsupervised Video Object (VOS) task which learns visual correspondence from unlabeled videos.
We propose the In-aNd-Out (INO) generative learning from a purely generative perspective, which captures both high-level and fine-grained semantics.
Our INO outperforms previous state-of-the-art methods by significant margins.
arXiv Detail & Related papers (2022-03-29T07:56:21Z) - TRACER: Extreme Attention Guided Salient Object Tracing Network [3.2434811678562676]
We propose TRACER, which detects salient objects with explicit edges by incorporating attention guided tracing modules.
A comparison with 13 existing methods reveals that TRACER achieves state-of-the-art performance on five benchmark datasets.
arXiv Detail & Related papers (2021-12-14T13:20:07Z) - Activation to Saliency: Forming High-Quality Labels for Unsupervised
Salient Object Detection [54.92703325989853]
We propose a two-stage Activation-to-Saliency (A2S) framework that effectively generates high-quality saliency cues.
No human annotations are involved in our framework during the whole training process.
Our framework reports significant performance compared with existing USOD methods.
arXiv Detail & Related papers (2021-12-07T11:54:06Z) - CRIS: CLIP-Driven Referring Image Segmentation [71.56466057776086]
We propose an end-to-end CLIP-Driven Referring Image framework (CRIS)
CRIS resorts to vision-language decoding and contrastive learning for achieving the text-to-pixel alignment.
Our proposed framework significantly outperforms the state-of-the-art performance without any post-processing.
arXiv Detail & Related papers (2021-11-30T07:29:08Z) - EvDistill: Asynchronous Events to End-task Learning via Bidirectional
Reconstruction-guided Cross-modal Knowledge Distillation [61.33010904301476]
Event cameras sense per-pixel intensity changes and produce asynchronous event streams with high dynamic range and less motion blur.
We propose a novel approach, called bfEvDistill, to learn a student network on the unlabeled and unpaired event data.
We show that EvDistill achieves significantly better results than the prior works and KD with only events and APS frames.
arXiv Detail & Related papers (2021-11-24T08:48:16Z) - Event-LSTM: An Unsupervised and Asynchronous Learning-based
Representation for Event-based Data [8.931153235278831]
Event cameras are activity-driven bio-inspired vision sensors.
We propose Event-LSTM, an unsupervised Auto-Encoder architecture made up of LSTM layers.
We also push state-of-the-art event de-noising forward by introducing memory into the de-noising process.
arXiv Detail & Related papers (2021-05-10T09:18:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.