Related papers: Aerial Images Meet Crowdsourced Trajectories: A New Approach to Robust Road Extraction

Aerial Images Meet Crowdsourced Trajectories: A New Approach to Robust Road Extraction

URL: http://arxiv.org/abs/2111.15119v1
Date: Tue, 30 Nov 2021 04:30:10 GMT
Title: Aerial Images Meet Crowdsourced Trajectories: A New Approach to Robust Road Extraction
Authors: Lingbo Liu and Zewei Yang and Guanbin Li and Kuo Wang and Tianshui Chen and Liang Lin
Abstract summary: We introduce a novel neural network framework termed Cross-Modal Message Propagation Network (CMMPNet) CMMPNet is composed of two deep Auto-Encoders for modality-specific representation learning and a tailor-designed Dual Enhancement Module for cross-modal representation refinement. Experiments on three real-world benchmarks demonstrate the effectiveness of our CMMPNet for robust road extraction.
Score: 110.61383502442598
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Land remote sensing analysis is a crucial research in earth science. In this work, we focus on a challenging task of land analysis, i.e., automatic extraction of traffic roads from remote sensing data, which has widespread applications in urban development and expansion estimation. Nevertheless, conventional methods either only utilized the limited information of aerial images, or simply fused multimodal information (e.g., vehicle trajectories), thus cannot well recognize unconstrained roads. To facilitate this problem, we introduce a novel neural network framework termed Cross-Modal Message Propagation Network (CMMPNet), which fully benefits the complementary different modal data (i.e., aerial images and crowdsourced trajectories). Specifically, CMMPNet is composed of two deep Auto-Encoders for modality-specific representation learning and a tailor-designed Dual Enhancement Module for cross-modal representation refinement. In particular, the complementary information of each modality is comprehensively extracted and dynamically propagated to enhance the representation of another modality. Extensive experiments on three real-world benchmarks demonstrate the effectiveness of our CMMPNet for robust road extraction benefiting from blending different modal data, either using image and trajectory data or image and Lidar data. From the experimental results, we observe that the proposed approach outperforms current state-of-the-art methods by large margins.

Related papers

URoadNet: Dual Sparse Attentive U-Net for Multiscale Road Network Extraction [35.39993205110938]
We introduce a computationally efficient and powerful framework for elegant road-aware segmentation. Our method, called URoadNet, effectively encodes fine-grained local road connectivity and holistic global topological semantics. Our approach represents a significant advancement in the field of road network extraction.
arXiv Detail & Related papers (2024-12-23T13:45:29Z)
MM-Path: Multi-modal, Multi-granularity Path Representation Learning -- Extended Version [12.938987616850389]
We propose a novel Multi-modal, Multi-granularity Path Representation Learning Framework (MM-Path) MM-Path can learn a generic path representation by integrating modalities from both road paths and image paths.
arXiv Detail & Related papers (2024-11-27T15:10:22Z)
Trajectory Representation Learning on Road Networks and Grids with Spatio-Temporal Dynamics [0.8655526882770742]
Trajectory representation learning is a fundamental task for applications in fields including smart city, and urban planning. We propose TIGR, a novel model designed to integrate grid and road network modalities while incorporatingtemporal dynamics. We evaluate TIGR on two realworld datasets and demonstrate the effectiveness of combining both modalities.
arXiv Detail & Related papers (2024-11-21T10:56:02Z)
Context-Enhanced Multi-View Trajectory Representation Learning: Bridging the Gap through Self-Supervised Models [27.316692263196277]
MVTraj is a novel multi-view modeling method for trajectory representation learning. It integrates diverse contextual knowledge, from GPS to road network and points-of-interest to provide a more comprehensive understanding of trajectory data. Extensive experiments on real-world datasets demonstrate that MVTraj significantly outperforms existing baselines in tasks associated with various spatial views.
arXiv Detail & Related papers (2024-10-17T03:56:12Z)
Think Twice Before Recognizing: Large Multimodal Models for General Fine-grained Traffic Sign Recognition [49.20086587208214]
We propose a new strategy called think twice before recognizing to improve fine-grained traffic sign recognition (TSR) Our strategy achieves effective fine-grained TSR by stimulating the multiple-thinking capability of large multimodal models (LMM)
arXiv Detail & Related papers (2024-09-03T02:08:47Z)
More Than Routing: Joint GPS and Route Modeling for Refine Trajectory Representation Learning [26.630640299709114]
We propose Joint GPS and Route Modelling based on self-supervised technology, namely JGRM. We develop two encoders, each tailored to capture representations of route and GPS trajectories respectively. The representations from the two modalities are fed into a shared transformer for inter-modal information interaction.
arXiv Detail & Related papers (2024-02-25T18:27:25Z)
Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation [63.15257949821558]
Referring Remote Sensing Image (RRSIS) is a new challenge that combines computer vision and natural language processing. Traditional Referring Image (RIS) approaches have been impeded by the complex spatial scales and orientations found in aerial imagery. We introduce the Rotated Multi-Scale Interaction Network (RMSIN), an innovative approach designed for the unique demands of RRSIS.
arXiv Detail & Related papers (2023-12-19T08:14:14Z)
DouFu: A Double Fusion Joint Learning Method For Driving Trajectory Representation [13.321587117066166]
We propose a novel multimodal fusion model, DouFu, for trajectory representation joint learning. We first design movement, route, and global features generated from the trajectory data and urban functional zones. With the global semantic feature, DouFu produces a comprehensive embedding for each trajectory.
arXiv Detail & Related papers (2022-05-05T07:43:35Z)
Towards Scale Consistent Monocular Visual Odometry by Learning from the Virtual World [83.36195426897768]
We propose VRVO, a novel framework for retrieving the absolute scale from virtual data. We first train a scale-aware disparity network using both monocular real images and stereo virtual data. The resulting scale-consistent disparities are then integrated with a direct VO system.
arXiv Detail & Related papers (2022-03-11T01:51:54Z)
Road Network Guided Fine-Grained Urban Traffic Flow Inference [108.64631590347352]
Accurate inference of fine-grained traffic flow from coarse-grained one is an emerging yet crucial problem. We propose a novel Road-Aware Traffic Flow Magnifier (RATFM) that exploits the prior knowledge of road networks. Our method can generate high-quality fine-grained traffic flow maps.
arXiv Detail & Related papers (2021-09-29T07:51:49Z)
Scribble-based Weakly Supervised Deep Learning for Road Surface Extraction from Remote Sensing Images [7.1577508803778045]
We propose a scribble-based weakly supervised road surface extraction method named ScRoadExtractor. To propagate semantic information from sparse scribbles to unlabeled pixels, we introduce a road label propagation algorithm. The proposal masks generated from the road label propagation algorithm are utilized to train a dual-branch encoder-decoder network.
arXiv Detail & Related papers (2020-10-25T12:40:30Z)
Adaptive Context-Aware Multi-Modal Network for Depth Completion [107.15344488719322]
We propose to adopt the graph propagation to capture the observed spatial contexts. We then apply the attention mechanism on the propagation, which encourages the network to model the contextual information adaptively. Finally, we introduce the symmetric gated fusion strategy to exploit the extracted multi-modal features effectively. Our model, named Adaptive Context-Aware Multi-Modal Network (ACMNet), achieves the state-of-the-art performance on two benchmarks.
arXiv Detail & Related papers (2020-08-25T06:00:06Z)
X-ModalNet: A Semi-Supervised Deep Cross-Modal Network for Classification of Remote Sensing Data [69.37597254841052]
We propose a novel cross-modal deep-learning framework called X-ModalNet. X-ModalNet generalizes well, owing to propagating labels on an updatable graph constructed by high-level features on the top of the network. We evaluate X-ModalNet on two multi-modal remote sensing datasets (HSI-MSI and HSI-SAR) and achieve a significant improvement in comparison with several state-of-the-art methods.
arXiv Detail & Related papers (2020-06-24T15:29:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.