Related papers: Towards Visual Discrimination and Reasoning of Real-World Physical Dynamics: Physics-Grounded Anomaly Detection

Towards Visual Discrimination and Reasoning of Real-World Physical Dynamics: Physics-Grounded Anomaly Detection

URL: http://arxiv.org/abs/2503.03562v3
Date: Wed, 26 Mar 2025 03:58:26 GMT
Title: Towards Visual Discrimination and Reasoning of Real-World Physical Dynamics: Physics-Grounded Anomaly Detection
Authors: Wenqiao Li, Yao Gu, Xintao Chen, Xiaohao Xu, Ming Hu, Xiaonan Huang, Yingna Wu,
Abstract summary: Humans detect real-world object anomalies by perceiving, interacting, and reasoning based on object-conditioned physical knowledge.<n>Phys-AD is the first large-scale, real-world, physics-grounded video dataset for industrial anomaly detection.<n>The dataset includes more than 6400 videos across 22 real-world object categories, interacting with robot arms and motors, and exhibits 47 types of anomalies.
Score: 2.1013864820763755
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Humans detect real-world object anomalies by perceiving, interacting, and reasoning based on object-conditioned physical knowledge. The long-term goal of Industrial Anomaly Detection (IAD) is to enable machines to autonomously replicate this skill. However, current IAD algorithms are largely developed and tested on static, semantically simple datasets, which diverge from real-world scenarios where physical understanding and reasoning are essential. To bridge this gap, we introduce the Physics Anomaly Detection (Phys-AD) dataset, the first large-scale, real-world, physics-grounded video dataset for industrial anomaly detection. Collected using a real robot arm and motor, Phys-AD provides a diverse set of dynamic, semantically rich scenarios. The dataset includes more than 6400 videos across 22 real-world object categories, interacting with robot arms and motors, and exhibits 47 types of anomalies. Anomaly detection in Phys-AD requires visual reasoning, combining both physical knowledge and video content to determine object abnormality. We benchmark state-of-the-art anomaly detection methods under three settings: unsupervised AD, weakly-supervised AD, and video-understanding AD, highlighting their limitations in handling physics-grounded anomalies. Additionally, we introduce the Physics Anomaly Explanation (PAEval) metric, designed to assess the ability of visual-language foundation models to not only detect anomalies but also provide accurate explanations for their underlying physical causes. Our project is available at https://guyao2023.github.io/Phys-AD/.

Related papers

Measuring Physical Plausibility of 3D Human Poses Using Physics Simulation [19.26289173517333]
We introduce two metrics to capture the physical plausibility and stability of predicted 3D poses from any 3D Human Pose Estimation model.<n>Using physics simulation, we discover correlations with existing plausibility metrics and measuring stability during motion.
arXiv Detail & Related papers (2025-02-06T20:15:49Z)
PhysGame: Uncovering Physical Commonsense Violations in Gameplay Videos [66.09921831504238]
We propose PhysGame as a pioneering benchmark to evaluate physical commonsense violations in gameplay videos. Our findings reveal that the performance of current open-source video LLMs significantly lags behind that of proprietary counterparts. Based on the suite of datasets, we propose PhysVLM as a physical knowledge-enhanced video LLM.
arXiv Detail & Related papers (2024-12-02T18:47:25Z)
The Sound of Water: Inferring Physical Properties from Pouring Liquids [85.30865788636386]
We study the connection between audio-visual observations and the underlying physics of pouring liquids.<n>Our objective is to automatically infer physical properties such as the liquid level, the shape and size of the container, the pouring rate and the time to fill.
arXiv Detail & Related papers (2024-11-18T01:19:37Z)
ContPhy: Continuum Physical Concept Learning and Reasoning from Videos [86.63174804149216]
ContPhy is a novel benchmark for assessing machine physical commonsense. We evaluated a range of AI models and found that they still struggle to achieve satisfactory performance on ContPhy. We also introduce an oracle model (ContPRO) that marries the particle-based physical dynamic models with the recent large language models.
arXiv Detail & Related papers (2024-02-09T01:09:21Z)
PAD: A Dataset and Benchmark for Pose-agnostic Anomaly Detection [28.973078719467516]
We develop Multi-pose Anomaly Detection dataset and Pose-agnostic Anomaly Detection benchmark. Specifically, we build MAD using 20 complex-shaped LEGO toys with various poses, and high-quality and diverse 3D anomalies in both simulated and real environments. We also propose a novel method OmniposeAD, trained using MAD, specifically designed for pose-agnostic anomaly detection.
arXiv Detail & Related papers (2023-10-11T17:59:56Z)
Physically Grounded Vision-Language Models for Robotic Manipulation [59.143640049407104]
We propose PhysObjects, an object-centric dataset of 39.6K crowd-sourced and 417K automated physical concept annotations. We show that fine-tuning a vision-language model on PhysObjects improves its understanding of physical object concepts. We incorporate this physically grounded VLM in an interactive framework with a large language model-based robotic planner.
arXiv Detail & Related papers (2023-09-05T20:21:03Z)
Triggering Dark Showers with Conditional Dual Auto-Encoders [1.5615730862955413]
We present a family of conditional dual auto-encoders (CoDAEs) for generic and model-independent new physics searches at colliders.
arXiv Detail & Related papers (2023-06-22T15:13:18Z)
Trajectory Optimization for Physics-Based Reconstruction of 3d Human Pose from Monocular Video [31.96672354594643]
We focus on the task of estimating a physically plausible articulated human motion from monocular video. Existing approaches that do not consider physics often produce temporally inconsistent output with motion artifacts. We show that our approach achieves competitive results with respect to existing physics-based methods on the Human3.6M benchmark.
arXiv Detail & Related papers (2022-05-24T18:02:49Z)
SPACE: A Simulator for Physical Interactions and Causal Learning in 3D Environments [2.105564340986074]
We introduce SPACE: A Simulator for Physical Interactions and Causal Learning in 3D Environments. Inspired by daily object interactions, the SPACE dataset comprises videos depicting three types of physical events: containment, stability and contact. We show that the SPACE dataset improves the learning of intuitive physics with an approach inspired by curriculum learning.
arXiv Detail & Related papers (2021-08-13T11:49:46Z)
TRiPOD: Human Trajectory and Pose Dynamics Forecasting in the Wild [77.59069361196404]
TRiPOD is a novel method for predicting body dynamics based on graph attentional networks. To incorporate a real-world challenge, we learn an indicator representing whether an estimated body joint is visible/invisible at each frame. Our evaluation shows that TRiPOD outperforms all prior work and state-of-the-art specifically designed for each of the trajectory and pose forecasting tasks.
arXiv Detail & Related papers (2021-04-08T20:01:00Z)
Living in the Physics and Machine Learning Interplay for Earth Observation [7.669855697331746]
Inferences mean understanding variables relations, deriving models that are physically interpretable. Machine learning models alone are excellent approximators, but very often do not respect the most elementary laws of physics. This is a collective long-term AI agenda towards developing and applying algorithms capable of discovering knowledge in the Earth system.
arXiv Detail & Related papers (2020-10-18T16:58:20Z)
Occlusion resistant learning of intuitive physics from videos [52.25308231683798]
Key ability for artificial systems is to understand physical interactions between objects, and predict future outcomes of a situation. This ability, often referred to as intuitive physics, has recently received attention and several methods were proposed to learn these physical rules from video sequences.
arXiv Detail & Related papers (2020-04-30T19:35:54Z)
Visual Grounding of Learned Physical Models [66.04898704928517]
Humans intuitively recognize objects' physical properties and predict their motion, even when the objects are engaged in complicated interactions. We present a neural model that simultaneously reasons about physics and makes future predictions based on visual and dynamics priors. Experiments show that our model can infer the physical properties within a few observations, which allows the model to quickly adapt to unseen scenarios and make accurate predictions into the future.
arXiv Detail & Related papers (2020-04-28T17:06:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.