Related papers: Solving Scene Understanding for Autonomous Navigation in Unstructured Environments

Solving Scene Understanding for Autonomous Navigation in Unstructured Environments

URL: http://arxiv.org/abs/2507.20389v1
Date: Sun, 27 Jul 2025 19:11:21 GMT
Title: Solving Scene Understanding for Autonomous Navigation in Unstructured Environments
Authors: Naveen Mathews Renji, Kruthika K, Manasa Keshavamurthy, Pooja Kumari, S. Rajarajeswari,
Abstract summary: This paper performs semantic segmentation on the Indian Driving dataset.<n>The dataset is more challenging compared to other datasets like Cityscapes.<n>Five different models have been trained and their performance has been compared using the Mean Intersection over Union.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Autonomous vehicles are the next revolution in the automobile industry and they are expected to revolutionize the future of transportation. Understanding the scenario in which the autonomous vehicle will operate is critical for its competent functioning. Deep Learning has played a massive role in the progress that has been made till date. Semantic Segmentation, the process of annotating every pixel of an image with an object class, is one crucial part of this scene comprehension using Deep Learning. It is especially useful in Autonomous Driving Research as it requires comprehension of drivable and non-drivable areas, roadside objects and the like. In this paper semantic segmentation has been performed on the Indian Driving Dataset which has been recently compiled on the urban and rural roads of Bengaluru and Hyderabad. This dataset is more challenging compared to other datasets like Cityscapes, since it is based on unstructured driving environments. It has a four level hierarchy and in this paper segmentation has been performed on the first level. Five different models have been trained and their performance has been compared using the Mean Intersection over Union. These are UNET, UNET+RESNET50, DeepLabsV3, PSPNet and SegNet. The highest MIOU of 0.6496 has been achieved. The paper discusses the dataset, exploratory data analysis, preparation, implementation of the five models and studies the performance and compares the results achieved in the process.

Related papers

Segment Any Vehicle: Semantic and Visual Context Driven SAM and A Benchmark [12.231630639022335]
We propose SAV, a novel framework comprising three core components: a SAM-based encoder-decoder, a vehicle part knowledge graph, and a context sample retrieval encoding module.<n>The knowledge graph explicitly models the spatial and geometric relationships among vehicle parts through a structured ontology, effectively encoding prior structural knowledge.<n>We introduce a new large-scale benchmark dataset for vehicle part segmentation, named VehicleSeg10K, which contains 11,665 high-quality pixel-level annotations.
arXiv Detail & Related papers (2025-08-06T09:46:49Z)
Point Cloud Based Scene Segmentation: A Survey [3.0846824529023387]
We provide an overview of the current state-of-the-art methods in the field of Point Cloud Semantics for autonomous driving.<n>We categorize the approaches into projection-based, 3D-based and hybrid methods.<n>We also emphasize the importance of synthetic data to support research when real-world data is limited.
arXiv Detail & Related papers (2025-03-16T18:02:41Z)
Enhancing End-to-End Autonomous Driving with Latent World Model [78.22157677787239]
We propose a novel self-supervised learning approach using the LAtent World model (LAW) for end-to-end driving.<n> LAW predicts future scene features based on current features and ego trajectories.<n>This self-supervised task can be seamlessly integrated into perception-free and perception-based frameworks.
arXiv Detail & Related papers (2024-06-12T17:59:21Z)
Policy Pre-training for End-to-end Autonomous Driving via Self-supervised Geometric Modeling [96.31941517446859]
We propose PPGeo (Policy Pre-training via Geometric modeling), an intuitive and straightforward fully self-supervised framework curated for the policy pretraining in visuomotor driving. We aim at learning policy representations as a powerful abstraction by modeling 3D geometric scenes on large-scale unlabeled and uncalibrated YouTube driving videos. In the first stage, the geometric modeling framework generates pose and depth predictions simultaneously, with two consecutive frames as input. In the second stage, the visual encoder learns driving policy representation by predicting the future ego-motion and optimizing with the photometric error based on current visual observation only.
arXiv Detail & Related papers (2023-01-03T08:52:49Z)
Exploring Contextual Representation and Multi-Modality for End-to-End Autonomous Driving [58.879758550901364]
Recent perception systems enhance spatial understanding with sensor fusion but often lack full environmental context. We introduce a framework that integrates three cameras to emulate the human field of view, coupled with top-down bird-eye-view semantic data to enhance contextual representation. Our method achieves displacement error by 0.67m in open-loop settings, surpassing current methods by 6.9% on the nuScenes dataset.
arXiv Detail & Related papers (2022-10-13T05:56:20Z)
FedDrive: Generalizing Federated Learning to Semantic Segmentation in Autonomous Driving [27.781734303644516]
Federated learning aims to learn a global model while preserving privacy and leveraging data on millions of remote devices. We propose FedDrive, a new benchmark consisting of three settings and two datasets. We benchmark state-of-the-art algorithms from the federated learning literature through an in-depth analysis.
arXiv Detail & Related papers (2022-02-28T10:34:31Z)
Towards Optimal Strategies for Training Self-Driving Perception Models in Simulation [98.51313127382937]
We focus on the use of labels in the synthetic domain alone. Our approach introduces both a way to learn neural-invariant representations and a theoretically inspired view on how to sample the data from the simulator. We showcase our approach on the bird's-eye-view vehicle segmentation task with multi-sensor data.
arXiv Detail & Related papers (2021-11-15T18:37:43Z)
Fine-Grained Vehicle Perception via 3D Part-Guided Visual Data Augmentation [77.60050239225086]
We propose an effective training data generation process by fitting a 3D car model with dynamic parts to vehicles in real images. Our approach is fully automatic without any human interaction. We present a multi-task network for VUS parsing and a multi-stream network for VHI parsing.
arXiv Detail & Related papers (2020-12-15T03:03:38Z)
RELLIS-3D Dataset: Data, Benchmarks and Analysis [16.803548871633957]
RELLIS-3D is a multimodal dataset collected in an off-road environment. The data was collected on the Rellis Campus of Texas A&M University.
arXiv Detail & Related papers (2020-11-17T18:28:01Z)
IDDA: a large-scale multi-domain dataset for autonomous driving [16.101248613062292]
This paper contributes a new large scale, synthetic dataset for semantic segmentation with more than 100 different source visual domains. The dataset has been created to explicitly address the challenges of domain shift between training and test data in various weather and view point conditions.
arXiv Detail & Related papers (2020-04-17T15:22:38Z)
VehicleNet: Learning Robust Visual Representation for Vehicle Re-identification [116.1587709521173]
We propose to build a large-scale vehicle dataset (called VehicleNet) by harnessing four public vehicle datasets. We design a simple yet effective two-stage progressive approach to learning more robust visual representation from VehicleNet. We achieve the state-of-art accuracy of 86.07% mAP on the private test set of AICity Challenge.
arXiv Detail & Related papers (2020-04-14T05:06:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.