RoadFormer: Duplex Transformer for RGB-Normal Semantic Road Scene Parsing
- URL: http://arxiv.org/abs/2309.10356v4
- Date: Mon, 1 Jul 2024 06:01:29 GMT
- Title: RoadFormer: Duplex Transformer for RGB-Normal Semantic Road Scene Parsing
- Authors: Jiahang Li, Yikang Zhang, Peng Yun, Guangliang Zhou, Qijun Chen, Rui Fan,
- Abstract summary: RoadFormer is a Transformer-based data-fusion network developed for road scene parsing.
RoadFormer outperforms all other state-of-the-art networks for road scene parsing.
- Score: 17.118074007418123
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The recent advancements in deep convolutional neural networks have shown significant promise in the domain of road scene parsing. Nevertheless, the existing works focus primarily on freespace detection, with little attention given to hazardous road defects that could compromise both driving safety and comfort. In this paper, we introduce RoadFormer, a novel Transformer-based data-fusion network developed for road scene parsing. RoadFormer utilizes a duplex encoder architecture to extract heterogeneous features from both RGB images and surface normal information. The encoded features are subsequently fed into a novel heterogeneous feature synergy block for effective feature fusion and recalibration. The pixel decoder then learns multi-scale long-range dependencies from the fused and recalibrated heterogeneous features, which are subsequently processed by a Transformer decoder to produce the final semantic prediction. Additionally, we release SYN-UDTIRI, the first large-scale road scene parsing dataset that contains over 10,407 RGB images, dense depth images, and the corresponding pixel-level annotations for both freespace and road defects of different shapes and sizes. Extensive experimental evaluations conducted on our SYN-UDTIRI dataset, as well as on three public datasets, including KITTI road, CityScapes, and ORFD, demonstrate that RoadFormer outperforms all other state-of-the-art networks for road scene parsing. Specifically, RoadFormer ranks first on the KITTI road benchmark. Our source code, created dataset, and demo video are publicly available at mias.group/RoadFormer.
Related papers
- RoadFormer+: Delivering RGB-X Scene Parsing through Scale-Aware Information Decoupling and Advanced Heterogeneous Feature Fusion [23.08593450089786]
RoadFormer successfully extracts heterogeneous features from RGB images and surface normal maps.
RoadFormer+ represents additional types/modalities of data such as depth, thermal, surface normal, and polarization.
RoadFormer+ ranks first on the KITTI Road benchmark and achieves state-of-the-art performance in mean intersection over union.
arXiv Detail & Related papers (2024-07-31T14:25:16Z) - NeRO: Neural Road Surface Reconstruction [15.99050337416157]
This paper introduces a position encoding Multi-Layer Perceptrons (MLPs) framework to reconstruct road surfaces, with input as world coordinates x and y, and output as height, color, and semantic information.
The effectiveness of this method is demonstrated through its compatibility with a variety of road height sources like vehicle camera poses, LiDAR point clouds, and SFM point clouds, robust to the semantic noise of images like sparse labels and noise semantic prediction, and fast training speed.
arXiv Detail & Related papers (2024-05-17T05:41:45Z) - Homography Guided Temporal Fusion for Road Line and Marking Segmentation [73.47092021519245]
Road lines and markings are frequently occluded in the presence of moving vehicles, shadow, and glare.
We propose a Homography Guided Fusion (HomoFusion) module to exploit temporally-adjacent video frames for complementary cues.
We show that exploiting available camera intrinsic data and ground plane assumption for cross-frame correspondence can lead to a light-weight network with significantly improved performances in speed and accuracy.
arXiv Detail & Related papers (2024-04-11T10:26:40Z) - EMIE-MAP: Large-Scale Road Surface Reconstruction Based on Explicit Mesh and Implicit Encoding [21.117919848535422]
EMIE-MAP is a novel method for large-scale road surface reconstruction based on explicit mesh and implicit encoding.
Our method achieves remarkable road surface reconstruction performance in a variety of real-world challenging scenarios.
arXiv Detail & Related papers (2024-03-18T13:46:52Z) - Neuromorphic Synergy for Video Binarization [54.195375576583864]
Bimodal objects serve as a visual form to embed information that can be easily recognized by vision systems.
Neuromorphic cameras offer new capabilities for alleviating motion blur, but it is non-trivial to first de-blur and then binarize the images in a real-time manner.
We propose an event-based binary reconstruction method that leverages the prior knowledge of the bimodal target's properties to perform inference independently in both event space and image space.
We also develop an efficient integration method to propagate this binary image to high frame rate binary video.
arXiv Detail & Related papers (2024-02-20T01:43:51Z) - Traffic Congestion Prediction using Deep Convolutional Neural Networks:
A Color-coding Approach [0.0]
This work proposes a unique technique for traffic video classification using a color-coding scheme before training the traffic data in a Deep convolutional neural network.
At first, the video data is transformed into an imagery data set; then, the vehicle detection is performed using the You Only Look Once algorithm.
Using the UCSD dataset, we have obtained a classification accuracy of 98.2%.
arXiv Detail & Related papers (2022-09-16T14:02:20Z) - Aerial Images Meet Crowdsourced Trajectories: A New Approach to Robust
Road Extraction [110.61383502442598]
We introduce a novel neural network framework termed Cross-Modal Message Propagation Network (CMMPNet)
CMMPNet is composed of two deep Auto-Encoders for modality-specific representation learning and a tailor-designed Dual Enhancement Module for cross-modal representation refinement.
Experiments on three real-world benchmarks demonstrate the effectiveness of our CMMPNet for robust road extraction.
arXiv Detail & Related papers (2021-11-30T04:30:10Z) - SPIN Road Mapper: Extracting Roads from Aerial Images via Spatial and
Interaction Space Graph Reasoning for Autonomous Driving [64.10636296274168]
Road extraction is an essential step in building autonomous navigation systems.
Using just convolution neural networks (ConvNets) for this problem is not effective as it is inefficient at capturing distant dependencies between road segments in the image.
We propose a Spatial and Interaction Space Graph Reasoning (SPIN) module which when plugged into a ConvNet performs reasoning over graphs constructed on spatial and interaction spaces projected from the feature maps.
arXiv Detail & Related papers (2021-09-16T03:52:17Z) - Convolutional Recurrent Network for Road Boundary Extraction [99.55522995570063]
We tackle the problem of drivable road boundary extraction from LiDAR and camera imagery.
We design a structured model where a fully convolutional network obtains deep features encoding the location and direction of road boundaries.
We showcase the effectiveness of our method on a large North American city where we obtain perfect topology of road boundaries 99.3% of the time.
arXiv Detail & Related papers (2020-12-21T18:59:12Z) - Scribble-based Weakly Supervised Deep Learning for Road Surface
Extraction from Remote Sensing Images [7.1577508803778045]
We propose a scribble-based weakly supervised road surface extraction method named ScRoadExtractor.
To propagate semantic information from sparse scribbles to unlabeled pixels, we introduce a road label propagation algorithm.
The proposal masks generated from the road label propagation algorithm are utilized to train a dual-branch encoder-decoder network.
arXiv Detail & Related papers (2020-10-25T12:40:30Z) - VectorNet: Encoding HD Maps and Agent Dynamics from Vectorized
Representation [74.56282712099274]
This paper introduces VectorNet, a hierarchical graph neural network that exploits the spatial locality of individual road components represented by vectors.
By operating on the vectorized high definition (HD) maps and agent trajectories, we avoid lossy rendering and computationally intensive ConvNet encoding steps.
We evaluate VectorNet on our in-house behavior prediction benchmark and the recently released Argoverse forecasting dataset.
arXiv Detail & Related papers (2020-05-08T19:07:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.