Rethinking Top Probability from Multi-view for Distracted Driver Behaviour Localization
- URL: http://arxiv.org/abs/2411.12525v1
- Date: Tue, 19 Nov 2024 14:18:02 GMT
- Title: Rethinking Top Probability from Multi-view for Distracted Driver Behaviour Localization
- Authors: Quang Vinh Nguyen, Vo Hoang Thanh Son, Chau Truong Vinh Hoang, Duc Duy Nguyen, Nhat Huy Nguyen Minh, Soo-Hyung Kim,
- Abstract summary: Action localization task aims to recognize and comprehend human behaviors and actions from video data captured during real-world driving scenarios.
Previous studies have shown great action localization performance by applying a recognition model followed by probability-based post-processing.
In this work, we adopt an action recognition model based on self-supervise learning to detect distracted activities and give potential action probabilities.
- Score: 6.531367337657802
- License:
- Abstract: Naturalistic driving action localization task aims to recognize and comprehend human behaviors and actions from video data captured during real-world driving scenarios. Previous studies have shown great action localization performance by applying a recognition model followed by probability-based post-processing. Nevertheless, the probabilities provided by the recognition model frequently contain confused information causing challenge for post-processing. In this work, we adopt an action recognition model based on self-supervise learning to detect distracted activities and give potential action probabilities. Subsequently, a constraint ensemble strategy takes advantages of multi-camera views to provide robust predictions. Finally, we introduce a conditional post-processing operation to locate distracted behaviours and action temporal boundaries precisely. Experimenting on test set A2, our method obtains the sixth position on the public leaderboard of track 3 of the 2024 AI City Challenge.
Related papers
- About Time: Advances, Challenges, and Outlooks of Action Understanding [57.76390141287026]
This survey comprehensively reviews advances in uni- and multi-modal action understanding across a range of tasks.
We focus on prevalent challenges, overview widely adopted datasets, and survey seminal works with an emphasis on recent advances.
arXiv Detail & Related papers (2024-11-22T18:09:27Z) - Bidirectional Decoding: Improving Action Chunking via Closed-Loop Resampling [51.38330727868982]
Bidirectional Decoding (BID) is a test-time inference algorithm that bridges action chunking with closed-loop operations.
We show that BID boosts the performance of two state-of-the-art generative policies across seven simulation benchmarks and two real-world tasks.
arXiv Detail & Related papers (2024-08-30T15:39:34Z) - Multi-view Action Recognition via Directed Gromov-Wasserstein Discrepancy [12.257725479880458]
Action recognition has become one of the popular research topics in computer vision.
We propose a multi-view attention consistency method that computes the similarity between two attentions from two different views of the action videos.
Our approach applies the idea of Neural Radiance Field to implicitly render the features from novel views when training on single-view datasets.
arXiv Detail & Related papers (2024-05-02T14:43:21Z) - DeepLocalization: Using change point detection for Temporal Action Localization [2.4502578110136946]
We introduce DeepLocalization, an innovative framework devised for the real-time localization of actions tailored explicitly for monitoring driver behavior.
Our strategy employs a dual approach: leveraging Graph-Based Change-Point Detection for pinpointing actions in time alongside a Video Large Language Model (Video-LLM) for precisely categorizing activities.
arXiv Detail & Related papers (2024-04-18T15:25:59Z) - Versatile Navigation under Partial Observability via Value-guided Diffusion Policy [14.967107015417943]
We propose a versatile diffusion-based approach for both 2D and 3D route planning under partial observability.
Specifically, our value-guided diffusion policy first generates plans to predict actions across various timesteps.
We then employ a differentiable planner with state estimations to derive a value function, directing the agent's exploration and goal-seeking behaviors.
arXiv Detail & Related papers (2024-04-01T19:52:08Z) - Evidential Active Recognition: Intelligent and Prudent Open-World
Embodied Perception [21.639429724987902]
Active recognition enables robots to explore novel observations, thereby acquiring more information while circumventing undesired viewing conditions.
Most recognition modules are developed under the closed-world assumption, which makes them ill-equipped to handle unexpected inputs, such as the absence of the target object in the current observation.
We propose treating active recognition as a sequential evidence-gathering process, providing by-step uncertainty and reliable prediction under the evidence combination theory.
arXiv Detail & Related papers (2023-11-23T03:51:46Z) - Unsupervised Self-Driving Attention Prediction via Uncertainty Mining
and Knowledge Embedding [51.8579160500354]
We propose an unsupervised way to predict self-driving attention by uncertainty modeling and driving knowledge integration.
Results show equivalent or even more impressive performance compared to fully-supervised state-of-the-art approaches.
arXiv Detail & Related papers (2023-03-17T00:28:33Z) - H-SAUR: Hypothesize, Simulate, Act, Update, and Repeat for Understanding
Object Articulations from Interactions [62.510951695174604]
"Hypothesize, Simulate, Act, Update, and Repeat" (H-SAUR) is a probabilistic generative framework that generates hypotheses about how objects articulate given input observations.
We show that the proposed model significantly outperforms the current state-of-the-art articulated object manipulation framework.
We further improve the test-time efficiency of H-SAUR by integrating a learned prior from learning-based vision models.
arXiv Detail & Related papers (2022-10-22T18:39:33Z) - Few-Shot Fine-Grained Action Recognition via Bidirectional Attention and
Contrastive Meta-Learning [51.03781020616402]
Fine-grained action recognition is attracting increasing attention due to the emerging demand of specific action understanding in real-world applications.
We propose a few-shot fine-grained action recognition problem, aiming to recognize novel fine-grained actions with only few samples given for each class.
Although progress has been made in coarse-grained actions, existing few-shot recognition methods encounter two issues handling fine-grained actions.
arXiv Detail & Related papers (2021-08-15T02:21:01Z) - Instance-Aware Predictive Navigation in Multi-Agent Environments [93.15055834395304]
We propose an Instance-Aware Predictive Control (IPC) approach, which forecasts interactions between agents as well as future scene structures.
We adopt a novel multi-instance event prediction module to estimate the possible interaction among agents in the ego-centric view.
We design a sequential action sampling strategy to better leverage predicted states on both scene-level and instance-level.
arXiv Detail & Related papers (2021-01-14T22:21:25Z) - Uncertainty-Aware Vehicle Orientation Estimation for Joint
Detection-Prediction Models [12.56249869551208]
Orientation is an important property for downstream modules of an autonomous system.
We present a method that extends the existing models that perform joint object detection and motion prediction.
In addition, the approach is able to quantify prediction uncertainty, outputting the probability that the inferred orientation is flipped.
arXiv Detail & Related papers (2020-11-05T21:59:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.