Multi-Model Learning for Real-Time Automotive Semantic Foggy Scene
Understanding via Domain Adaptation
- URL: http://arxiv.org/abs/2012.05320v1
- Date: Wed, 9 Dec 2020 21:04:05 GMT
- Title: Multi-Model Learning for Real-Time Automotive Semantic Foggy Scene
Understanding via Domain Adaptation
- Authors: Naif Alshammari, Samet Akcay, and Toby P. Breckon
- Abstract summary: We propose an efficient end-to-end automotive semantic scene understanding approach that is robust to foggy weather conditions.
Our approach incorporates RGB colour, depth and luminance images via distinct encoders with dense connectivity.
Our model achieves comparable performance to contemporary approaches at a fraction of the overall model complexity.
- Score: 17.530091734327296
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Robust semantic scene segmentation for automotive applications is a
challenging problem in two key aspects: (1) labelling every individual scene
pixel and (2) performing this task under unstable weather and illumination
changes (e.g., foggy weather), which results in poor outdoor scene visibility.
Such visibility limitations lead to non-optimal performance of generalised deep
convolutional neural network-based semantic scene segmentation. In this paper,
we propose an efficient end-to-end automotive semantic scene understanding
approach that is robust to foggy weather conditions. As an end-to-end pipeline,
our proposed approach provides: (1) the transformation of imagery from foggy to
clear weather conditions using a domain transfer approach (correcting for poor
visibility) and (2) semantically segmenting the scene using a competitive
encoder-decoder architecture with low computational complexity (enabling
real-time performance). Our approach incorporates RGB colour, depth and
luminance images via distinct encoders with dense connectivity and features
fusion to effectively exploit information from different inputs, which
contributes to an optimal feature representation within the overall model.
Using this architectural formulation with dense skip connections, our model
achieves comparable performance to contemporary approaches at a fraction of the
overall model complexity.
Related papers
- EraW-Net: Enhance-Refine-Align W-Net for Scene-Associated Driver Attention Estimation [17.0226030258296]
Associating driver attention with driving scene across two fields of views is a hard cross-domain perception problem.
Previous methods typically focus on a single view or map attention to the scene via estimated gaze.
We propose a novel method for end-to-end scene-associated driver attention estimation, called EraWNet.
arXiv Detail & Related papers (2024-08-16T07:12:47Z) - MaeFuse: Transferring Omni Features with Pretrained Masked Autoencoders for Infrared and Visible Image Fusion via Guided Training [57.18758272617101]
MaeFuse is a novel autoencoder model designed for infrared and visible image fusion (IVIF)
Our model utilizes a pretrained encoder from Masked Autoencoders (MAE), which facilities the omni features extraction for low-level reconstruction and high-level vision tasks.
MaeFuse not only introduces a novel perspective in the realm of fusion techniques but also stands out with impressive performance across various public datasets.
arXiv Detail & Related papers (2024-04-17T02:47:39Z) - Homography Guided Temporal Fusion for Road Line and Marking Segmentation [73.47092021519245]
Road lines and markings are frequently occluded in the presence of moving vehicles, shadow, and glare.
We propose a Homography Guided Fusion (HomoFusion) module to exploit temporally-adjacent video frames for complementary cues.
We show that exploiting available camera intrinsic data and ground plane assumption for cross-frame correspondence can lead to a light-weight network with significantly improved performances in speed and accuracy.
arXiv Detail & Related papers (2024-04-11T10:26:40Z) - DNS SLAM: Dense Neural Semantic-Informed SLAM [92.39687553022605]
DNS SLAM is a novel neural RGB-D semantic SLAM approach featuring a hybrid representation.
Our method integrates multi-view geometry constraints with image-based feature extraction to improve appearance details.
Our experimental results achieve state-of-the-art performance on both synthetic data and real-world data tracking.
arXiv Detail & Related papers (2023-11-30T21:34:44Z) - Corner-to-Center Long-range Context Model for Efficient Learned Image
Compression [70.0411436929495]
In the framework of learned image compression, the context model plays a pivotal role in capturing the dependencies among latent representations.
We propose the textbfCorner-to-Center transformer-based Context Model (C$3$M) designed to enhance context and latent predictions.
In addition, to enlarge the receptive field in the analysis and synthesis transformation, we use the Long-range Crossing Attention Module (LCAM) in the encoder/decoder.
arXiv Detail & Related papers (2023-11-29T21:40:28Z) - Combined CNN Transformer Encoder for Enhanced Fine-grained Human Action
Recognition [11.116921653535226]
We investigate two frameworks that combine CNN vision backbone and Transformer to enhance fine-grained action recognition.
Our experimental results show that both our Transformer encoder frameworks effectively learn latent temporal semantics and cross-modality association.
We achieve new state-of-the-art performance on the FineGym benchmark dataset for both proposed architectures.
arXiv Detail & Related papers (2022-08-03T08:01:55Z) - Dynamic Spatial Sparsification for Efficient Vision Transformers and
Convolutional Neural Networks [88.77951448313486]
We present a new approach for model acceleration by exploiting spatial sparsity in visual data.
We propose a dynamic token sparsification framework to prune redundant tokens.
We extend our method to hierarchical models including CNNs and hierarchical vision Transformers.
arXiv Detail & Related papers (2022-07-04T17:00:51Z) - Decoupled Spatial-Temporal Transformer for Video Inpainting [77.8621673355983]
Video aims to fill the given holes with realistic appearance but is still a challenging task even with prosperous deep learning approaches.
Recent works introduce the promising Transformer architecture into deep video inpainting and achieve better performance.
We propose a Decoupled Spatial-Temporal Transformer (DSTT) for improving video inpainting with exceptional efficiency.
arXiv Detail & Related papers (2021-04-14T05:47:46Z) - Competitive Simplicity for Multi-Task Learning for Real-Time Foggy Scene
Understanding via Domain Adaptation [17.530091734327296]
We propose a multi-task learning approach capable of performing in real-time semantic scene understanding and monocular depth estimation under foggy weather conditions.
Our model incorporates RGB colour, depth, and luminance images via distinct encoders with dense connectivity and features fusing.
arXiv Detail & Related papers (2020-12-09T20:38:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.