An Image-based Approach of Task-driven Driving Scene Categorization
- URL: http://arxiv.org/abs/2103.05920v1
- Date: Wed, 10 Mar 2021 08:23:36 GMT
- Title: An Image-based Approach of Task-driven Driving Scene Categorization
- Authors: Shaochi Hu, Hanwei Fan, Biao Gao, XijunZhao and Huijing Zhao
- Abstract summary: This paper proposes a method of task-driven driving scene categorization using weakly supervised data.
A measure is learned to discriminate the scenes of different semantic attributes via contrastive learning.
The results of semantic scene similarity learning and driving scene categorization are extensively studied.
- Score: 7.291979964739049
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Categorizing driving scenes via visual perception is a key technology for
safe driving and the downstream tasks of autonomous vehicles.
Traditional methods infer scene category by detecting scene-related objects
or using a classifier that is trained on large datasets of fine-labeled scene
images.
Whereas at cluttered dynamic scenes such as campus or park, human activities
are not strongly confined by rules, and the functional attributes of places are
not strongly correlated with objects. So how to define, model and infer scene
categories is crucial to make the technique really helpful in assisting a robot
to pass through the scene.
This paper proposes a method of task-driven driving scene categorization
using weakly supervised data.
Given a front-view video of a driving scene, a set of anchor points is marked
by following the decision making of a human driver, where an anchor point is
not a semantic label but an indicator meaning the semantic attribute of the
scene is different from that of the previous one.
A measure is learned to discriminate the scenes of different semantic
attributes via contrastive learning, and a driving scene profiling and
categorization method is developed based on that measure.
Experiments are conducted on a front-view video that is recorded when a
vehicle passed through the cluttered dynamic campus of Peking University. The
scenes are categorized into straight road, turn road and alerting traffic. The
results of semantic scene similarity learning and driving scene categorization
are extensively studied, and positive result of scene categorization is 97.17
\% on the learning video and 85.44\% on the video of new scenes.
Related papers
- Generating Human Interaction Motions in Scenes with Text Control [66.74298145999909]
We present TeSMo, a method for text-controlled scene-aware motion generation based on denoising diffusion models.
Our approach begins with pre-training a scene-agnostic text-to-motion diffusion model.
To facilitate training, we embed annotated navigation and interaction motions within scenes.
arXiv Detail & Related papers (2024-04-16T16:04:38Z) - Interpretable End-to-End Driving Model for Implicit Scene Understanding [3.4248756007722987]
We propose an end-to-end Interpretable Implicit Driving Scene Understanding (II-DSU) model to extract implicit high-dimensional scene features.
Our approach achieves the new state-of-the-art and is able to obtain scene features that embody richer scene information relevant to driving.
arXiv Detail & Related papers (2023-08-02T14:43:08Z) - Synthesizing Physical Character-Scene Interactions [64.26035523518846]
It is necessary to synthesize such interactions between virtual characters and their surroundings.
We present a system that uses adversarial imitation learning and reinforcement learning to train physically-simulated characters.
Our approach takes physics-based character motion generation a step closer to broad applicability.
arXiv Detail & Related papers (2023-02-02T05:21:32Z) - Learning Road Scene-level Representations via Semantic Region Prediction [11.518756759576657]
We tackle two vital tasks in automated driving systems, i.e., driver intent prediction and risk object identification from egocentric images.
We contend that a scene-level representation must capture higher-level semantic and geometric representations of traffic scenes around ego-vehicle.
We propose to learn scene-level representations via a novel semantic region prediction task and an automatic semantic region labeling algorithm.
arXiv Detail & Related papers (2023-01-02T15:13:30Z) - A Dynamic Data Driven Approach for Explainable Scene Understanding [0.0]
Scene-understanding is an important topic in the area of Computer Vision.
We consider the active explanation-driven understanding and classification of scenes.
Our framework is entitled ACUMEN: Active Classification and Understanding Method by Explanation-driven Networks.
arXiv Detail & Related papers (2022-06-18T02:41:51Z) - An Active and Contrastive Learning Framework for Fine-Grained Off-Road
Semantic Segmentation [7.035838394813961]
Off-road semantic segmentation with fine-grained labels is necessary for autonomous vehicles to understand driving scenes.
Fine-grained semantic segmentation in off-road scenes usually has no unified category definition due to ambiguous nature environments.
This research proposes an active and contrastive learning-based method that does not rely on pixel-wise labels.
arXiv Detail & Related papers (2022-02-18T03:16:31Z) - Fine-Grained Off-Road Semantic Segmentation and Mapping via Contrastive
Learning [7.965964259208489]
Road detection or traversability analysis has been a key technique for a mobile robot to traverse complex off-road scenes.
understanding scenes with fine-grained labels are needed for off-road robots, as scenes are very diverse.
This research proposes a contrastive learning based method to achieve meaningful scene understanding for a robot to traverse off-road.
arXiv Detail & Related papers (2021-03-05T13:23:24Z) - SceneGen: Learning to Generate Realistic Traffic Scenes [92.98412203941912]
We present SceneGen, a neural autoregressive model of traffic scenes that eschews the need for rules and distributions.
We demonstrate SceneGen's ability to faithfully model distributions of real traffic scenes.
arXiv Detail & Related papers (2021-01-16T22:51:43Z) - Studying Person-Specific Pointing and Gaze Behavior for Multimodal
Referencing of Outside Objects from a Moving Vehicle [58.720142291102135]
Hand pointing and eye gaze have been extensively investigated in automotive applications for object selection and referencing.
Existing outside-the-vehicle referencing methods focus on a static situation, whereas the situation in a moving vehicle is highly dynamic and subject to safety-critical constraints.
We investigate the specific characteristics of each modality and the interaction between them when used in the task of referencing outside objects.
arXiv Detail & Related papers (2020-09-23T14:56:19Z) - BoMuDANet: Unsupervised Adaptation for Visual Scene Understanding in
Unstructured Driving Environments [54.22535063244038]
We present an unsupervised adaptation approach for visual scene understanding in unstructured traffic environments.
Our method is designed for unstructured real-world scenarios with dense and heterogeneous traffic consisting of cars, trucks, two-and three-wheelers, and pedestrians.
arXiv Detail & Related papers (2020-09-22T08:25:44Z) - Enhancing Unsupervised Video Representation Learning by Decoupling the
Scene and the Motion [86.56202610716504]
Action categories are highly related with the scene where the action happens, making the model tend to degrade to a solution where only the scene information is encoded.
We propose to decouple the scene and the motion (DSM) with two simple operations, so that the model attention towards the motion information is better paid.
arXiv Detail & Related papers (2020-09-12T09:54:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.