3DTTNet: Multimodal Fusion-Based 3D Traversable Terrain Modeling for Off-Road Environments
- URL: http://arxiv.org/abs/2412.08195v2
- Date: Wed, 06 Aug 2025 14:02:23 GMT
- Title: 3DTTNet: Multimodal Fusion-Based 3D Traversable Terrain Modeling for Off-Road Environments
- Authors: Zitong Chen, Chao Sun, Shida Nie, Chen Min, Changjiu Ning, Haoyu Li, Bo Wang,
- Abstract summary: Off-road environments are significant challenges for autonomous ground vehicles.<n>In this paper, traversable area recognition is achieved through semantic scene completion.<n>A novel multimodal method, 3DTTNet, is proposed to generate dense traversable terrain estimations.
- Score: 10.521569910467072
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Off-road environments remain significant challenges for autonomous ground vehicles, due to the lack of structured roads and the presence of complex obstacles, such as uneven terrain, vegetation, and occlusions. Traditional perception algorithms, primarily designed for structured environments, often fail in unstructured scenarios. In this paper, traversable area recognition is achieved through semantic scene completion. A novel multimodal method, 3DTTNet, is proposed to generate dense traversable terrain estimations by integrating LiDAR point clouds with monocular images from a forward-facing perspective. By integrating multimodal data, environmental feature extraction is strengthened, which is crucial for accurate terrain modeling in complex terrains. Furthermore, RELLIS-OCC, a dataset with 3D traversable annotations, is introduced, incorporating geometric features such as step height, slope, and unevenness. Through a comprehensive analysis of vehicle obsta cle-crossing conditions and the incorporation of vehicle body structure constraints, four traversability cost labels are generated: lethal, medium-cost, low-cost, and free. Experimental results demonstrate that 3DTTNet outperforms the comparison approaches in 3D traversable area recognition, particularly in off-road environments with irregular geometries and partial occlusions. Specifically, 3DTTNet achieves a 42\% improvement in scene completion IoU compared to other models. The proposed framework is scalable and adaptable to various vehicle platforms, allowing for adjustments to occupancy grid parameters and the integration of advanced dynamic models for traversability cost estimation.
Related papers
- Boosting Multi-View Indoor 3D Object Detection via Adaptive 3D Volume Construction [10.569056109735735]
This work presents SGCDet, a novel multi-view indoor 3D object detection framework based on adaptive 3D volume construction.<n>We introduce a geometry and context aware aggregation module to integrate geometric and contextual information within adaptive regions in each image.<n>We show that SGCDet achieves state-of-the-art performance on the ScanNet, ScanNet200 and ARKitScenes datasets.
arXiv Detail & Related papers (2025-07-24T11:58:01Z) - Move to Understand a 3D Scene: Bridging Visual Grounding and Exploration for Efficient and Versatile Embodied Navigation [54.04601077224252]
Embodied scene understanding requires not only comprehending visual-spatial information but also determining where to explore next in the 3D physical world.<n>underlinetextbf3D vision-language learning enables embodied agents to effectively explore and understand their environment.<n>model's versatility enables navigation using diverse input modalities, including categories, language descriptions, and reference images.
arXiv Detail & Related papers (2025-07-05T14:15:52Z) - Implicit 3D scene reconstruction using deep learning towards efficient collision understanding in autonomous driving [0.0]
This study develops a learning-based 3D scene reconstruction methodology that leverages LiDAR data and a deep neural network to build the static Signed Distance Function (SDF) maps.<n>Our preliminary results demonstrate that this method would significantly enhance collision detection performance, particularly in congested and dynamic environments.
arXiv Detail & Related papers (2025-06-18T18:42:04Z) - GeoDrive: 3D Geometry-Informed Driving World Model with Precise Action Control [50.67481583744243]
We introduce GeoDrive, which explicitly integrates robust 3D geometry conditions into driving world models.<n>We propose a dynamic editing module during training to enhance the renderings by editing the positions of the vehicles.<n>Our method significantly outperforms existing models in both action accuracy and 3D spatial awareness.
arXiv Detail & Related papers (2025-05-28T14:46:51Z) - Highly Accurate and Diverse Traffic Data: The DeepScenario Open 3D Dataset [25.244956737443527]
We introduce the DeepScenario Open 3D dataset (DSC3D) of 6 degrees of freedom bounding box trajectories acquired through a novel monocular camera drone tracking pipeline.<n>Our dataset includes more than 175,000 trajectories of 14 types of traffic participants and significantly exceeds existing datasets in terms of diversity and scale.<n>We demonstrate its utility across multiple applications including motion prediction, motion planning, scenario mining, and generative reactive traffic agents.
arXiv Detail & Related papers (2025-04-24T08:43:48Z) - PFSD: A Multi-Modal Pedestrian-Focus Scene Dataset for Rich Tasks in Semi-Structured Environments [73.80718037070773]
We present the multi-modal Pedestrian-Focused Scene dataset, rigorously annotated in semi-structured scenes with the format of nuScenes.
We also propose a novel Hybrid Multi-Scale Fusion Network (HMFN) to detect pedestrians in densely populated and occluded scenarios.
arXiv Detail & Related papers (2025-02-21T09:57:53Z) - AdaOcc: Adaptive-Resolution Occupancy Prediction [20.0994984349065]
We introduce AdaOcc, a novel adaptive-resolution, multi-modal prediction approach.
Our method integrates object-centric 3D reconstruction and holistic occupancy prediction within a single framework.
In close-range scenarios, we surpass previous baselines by over 13% in IOU, and over 40% in Hausdorff distance.
arXiv Detail & Related papers (2024-08-24T03:46:25Z) - Outdoor Scene Extrapolation with Hierarchical Generative Cellular Automata [70.9375320609781]
We aim to generate fine-grained 3D geometry from large-scale sparse LiDAR scans, abundantly captured by autonomous vehicles (AV)
We propose hierarchical Generative Cellular Automata (hGCA), a spatially scalable 3D generative model, which grows geometry with local kernels following, in a coarse-to-fine manner, equipped with a light-weight planner to induce global consistency.
arXiv Detail & Related papers (2024-06-12T14:56:56Z) - OccNeRF: Advancing 3D Occupancy Prediction in LiDAR-Free Environments [77.0399450848749]
We propose an OccNeRF method for training occupancy networks without 3D supervision.
We parameterize the reconstructed occupancy fields and reorganize the sampling strategy to align with the cameras' infinite perceptive range.
For semantic occupancy prediction, we design several strategies to polish the prompts and filter the outputs of a pretrained open-vocabulary 2D segmentation model.
arXiv Detail & Related papers (2023-12-14T18:58:52Z) - MV-DeepSDF: Implicit Modeling with Multi-Sweep Point Clouds for 3D
Vehicle Reconstruction in Autonomous Driving [25.088617195439344]
We propose a novel framework, dubbed MV-DeepSDF, which estimates the optimal Signed Distance Function (SDF) shape representation from multi-sweep point clouds.
We conduct thorough experiments on two real-world autonomous driving datasets.
arXiv Detail & Related papers (2023-08-21T15:48:15Z) - METAVerse: Meta-Learning Traversability Cost Map for Off-Road Navigation [5.036362492608702]
This paper presents METAVerse, a meta-learning framework for learning a global model that accurately predicts terrain traversability.
We train the traversability prediction network to generate a dense and continuous-terrain cost map from a sparse LiDAR point cloud.
Online adaptation is performed to rapidly adapt the network to the local environment by exploiting recent interaction experiences.
arXiv Detail & Related papers (2023-07-26T06:58:19Z) - Monocular BEV Perception of Road Scenes via Front-to-Top View Projection [57.19891435386843]
We present a novel framework that reconstructs a local map formed by road layout and vehicle occupancy in the bird's-eye view.
Our model runs at 25 FPS on a single GPU, which is efficient and applicable for real-time panorama HD map reconstruction.
arXiv Detail & Related papers (2022-11-15T13:52:41Z) - Uncertainty-aware Perception Models for Off-road Autonomous Unmanned
Ground Vehicles [6.2574402913714575]
Off-road autonomous unmanned ground vehicles (UGVs) are being developed for military and commercial use to deliver crucial supplies in remote locations.
Current datasets used to train perception models for off-road autonomous navigation lack of diversity in seasons, locations, semantic classes, as well as time of day.
We investigate how to combine multiple datasets to train a semantic segmentation-based environment perception model.
We show that training the model to capture uncertainty could improve the model performance by a significant margin.
arXiv Detail & Related papers (2022-09-22T15:59:33Z) - Decentralized Vehicle Coordination: The Berkeley DeepDrive Drone Dataset and Consensus-Based Models [76.32775745488073]
We present a novel dataset and modeling framework designed to study motion planning in understructured environments.
We demonstrate that a consensus-based modeling approach can effectively explain the emergence of priority orders observed in our dataset.
arXiv Detail & Related papers (2022-09-19T05:06:57Z) - Multi-Modal Fusion Transformer for End-to-End Autonomous Driving [59.60483620730437]
We propose TransFuser, a novel Multi-Modal Fusion Transformer, to integrate image and LiDAR representations using attention.
Our approach achieves state-of-the-art driving performance while reducing collisions by 76% compared to geometry-based fusion.
arXiv Detail & Related papers (2021-04-19T11:48:13Z) - Detecting 32 Pedestrian Attributes for Autonomous Vehicles [103.87351701138554]
In this paper, we address the problem of jointly detecting pedestrians and recognizing 32 pedestrian attributes.
We introduce a Multi-Task Learning (MTL) model relying on a composite field framework, which achieves both goals in an efficient way.
We show competitive detection and attribute recognition results, as well as a more stable MTL training.
arXiv Detail & Related papers (2020-12-04T15:10:12Z) - PerMO: Perceiving More at Once from a Single Image for Autonomous
Driving [76.35684439949094]
We present a novel approach to detect, segment, and reconstruct complete textured 3D models of vehicles from a single image.
Our approach combines the strengths of deep learning and the elegance of traditional techniques.
We have integrated these algorithms with an autonomous driving system.
arXiv Detail & Related papers (2020-07-16T05:02:45Z) - Reconfigurable Voxels: A New Representation for LiDAR-Based Point Clouds [76.52448276587707]
We propose Reconfigurable Voxels, a new approach to constructing representations from 3D point clouds.
Specifically, we devise a biased random walk scheme, which adaptively covers each neighborhood with a fixed number of voxels.
We find that this approach effectively improves the stability of voxel features, especially for sparse regions.
arXiv Detail & Related papers (2020-04-06T15:07:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.