Detecting 32 Pedestrian Attributes for Autonomous Vehicles
- URL: http://arxiv.org/abs/2012.02647v1
- Date: Fri, 4 Dec 2020 15:10:12 GMT
- Title: Detecting 32 Pedestrian Attributes for Autonomous Vehicles
- Authors: Taylor Mordan, Matthieu Cord, Patrick P\'erez and Alexandre Alahi
- Abstract summary: In this paper, we address the problem of jointly detecting pedestrians and recognizing 32 pedestrian attributes.
We introduce a Multi-Task Learning (MTL) model relying on a composite field framework, which achieves both goals in an efficient way.
We show competitive detection and attribute recognition results, as well as a more stable MTL training.
- Score: 103.87351701138554
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Pedestrians are arguably one of the most safety-critical road users to
consider for autonomous vehicles in urban areas. In this paper, we address the
problem of jointly detecting pedestrians and recognizing 32 pedestrian
attributes. These encompass visual appearance and behavior, and also include
the forecasting of road crossing, which is a main safety concern. For this, we
introduce a Multi-Task Learning (MTL) model relying on a composite field
framework, which achieves both goals in an efficient way. Each field spatially
locates pedestrian instances and aggregates attribute predictions over them.
This formulation naturally leverages spatial context, making it well suited to
low resolution scenarios such as autonomous driving. By increasing the number
of attributes jointly learned, we highlight an issue related to the scales of
gradients, which arises in MTL with numerous tasks. We solve it by normalizing
the gradients coming from different objective functions when they join at the
fork in the network architecture during the backward pass, referred to as
fork-normalization. Experimental validation is performed on JAAD, a dataset
providing numerous attributes for pedestrian analysis from autonomous vehicles,
and shows competitive detection and attribute recognition results, as well as a
more stable MTL training.
Related papers
- Unsupervised Domain Adaptation for Self-Driving from Past Traversal
Features [69.47588461101925]
We propose a method to adapt 3D object detectors to new driving environments.
Our approach enhances LiDAR-based detection models using spatial quantized historical features.
Experiments on real-world datasets demonstrate significant improvements.
arXiv Detail & Related papers (2023-09-21T15:00:31Z) - Learning Pedestrian Actions to Ensure Safe Autonomous Driving [12.440017892152417]
It is critical for Autonomous Vehicles to have the ability to predict pedestrians' short-term and immediate actions in real-time.
In this work, a novel multi-task sequence to sequence Transformer encoders-decoders (TF-ed) architecture is proposed for pedestrian action and trajectory prediction.
The proposed approach is compared against an existing LSTM encoders decoders (LSTM-ed) architecture for action and trajectory prediction.
arXiv Detail & Related papers (2023-05-22T14:03:38Z) - Local and Global Contextual Features Fusion for Pedestrian Intention
Prediction [2.203209457340481]
We analyse and analyse visual features of both pedestrian and traffic contexts.
To understand the global context, we utilise location, motion, and environmental information.
These multi-modality features are intelligently fused for effective intention learning.
arXiv Detail & Related papers (2023-05-01T22:37:31Z) - Unsupervised Adaptation from Repeated Traversals for Autonomous Driving [54.59577283226982]
Self-driving cars must generalize to the end-user's environment to operate reliably.
One potential solution is to leverage unlabeled data collected from the end-users' environments.
There is no reliable signal in the target domain to supervise the adaptation process.
We show that this simple additional assumption is sufficient to obtain a potent signal that allows us to perform iterative self-training of 3D object detectors on the target domain.
arXiv Detail & Related papers (2023-03-27T15:07:55Z) - Multi-Agent Chance-Constrained Stochastic Shortest Path with Application
to Risk-Aware Intelligent Intersection [15.149982804527182]
A formidable challenge for existing automated intersections lies in detecting and reasoning about uncertainty from the operating environment and human-driven vehicles.
We propose a risk-aware intelligent intersection system for autonomous vehicles (AVs) as well as human-driven vehicles (HVs)
arXiv Detail & Related papers (2022-10-03T06:49:23Z) - Towards Multimodal Multitask Scene Understanding Models for Indoor
Mobile Agents [49.904531485843464]
In this paper, we discuss the main challenge: insufficient, or even no, labeled data for real-world indoor environments.
We describe MMISM (Multi-modality input Multi-task output Indoor Scene understanding Model) to tackle the above challenges.
MMISM considers RGB images as well as sparse Lidar points as inputs and 3D object detection, depth completion, human pose estimation, and semantic segmentation as output tasks.
We show that MMISM performs on par or even better than single-task models.
arXiv Detail & Related papers (2022-09-27T04:49:19Z) - PSE-Match: A Viewpoint-free Place Recognition Method with Parallel
Semantic Embedding [9.265785042748158]
PSE-Match is a viewpoint-free place recognition method based on parallel semantic analysis of isolated semantic attributes from 3D point-cloud models.
PSE-Match incorporates a divergence place learning network to capture different semantic attributes parallelly through the spherical harmonics domain.
arXiv Detail & Related papers (2021-08-01T22:16:40Z) - Multi-Modal Fusion Transformer for End-to-End Autonomous Driving [59.60483620730437]
We propose TransFuser, a novel Multi-Modal Fusion Transformer, to integrate image and LiDAR representations using attention.
Our approach achieves state-of-the-art driving performance while reducing collisions by 76% compared to geometry-based fusion.
arXiv Detail & Related papers (2021-04-19T11:48:13Z) - ROAD: The ROad event Awareness Dataset for Autonomous Driving [16.24547478826027]
ROAD is designed to test an autonomous vehicle's ability to detect road events.
It comprises 22 videos, annotated with bounding boxes showing the location in the image plane of each road event.
We also provide as baseline a new incremental algorithm for online road event awareness, based on RetinaNet along time.
arXiv Detail & Related papers (2021-02-23T09:48:56Z) - Fine-Grained Vehicle Perception via 3D Part-Guided Visual Data
Augmentation [77.60050239225086]
We propose an effective training data generation process by fitting a 3D car model with dynamic parts to vehicles in real images.
Our approach is fully automatic without any human interaction.
We present a multi-task network for VUS parsing and a multi-stream network for VHI parsing.
arXiv Detail & Related papers (2020-12-15T03:03:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.