Runtime Failure Hunting for Physics Engine Based Software Systems: How Far Can We Go?
- URL: http://arxiv.org/abs/2507.22099v1
- Date: Tue, 29 Jul 2025 17:58:41 GMT
- Title: Runtime Failure Hunting for Physics Engine Based Software Systems: How Far Can We Go?
- Authors: Shuqing Li, Qiang Chen, Xiaoxue Ren, Michael R. Lyu,
- Abstract summary: Physics Engines (PEs) are fundamental software frameworks that simulate physical interactions in applications ranging from entertainment to safety-critical systems.<n>PEs suffer from physics failures, deviations from expected physical behaviors that can compromise software reliability, degrade user experience, and potentially cause critical failures in autonomous vehicles or medical robotics.<n>This paper presents the first large-scale empirical study characterizing physics failures in PE-based software.
- Score: 32.20899533556529
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Physics Engines (PEs) are fundamental software frameworks that simulate physical interactions in applications ranging from entertainment to safety-critical systems. Despite their importance, PEs suffer from physics failures, deviations from expected physical behaviors that can compromise software reliability, degrade user experience, and potentially cause critical failures in autonomous vehicles or medical robotics. Current testing approaches for PE-based software are inadequate, typically requiring white-box access and focusing on crash detection rather than semantically complex physics failures. This paper presents the first large-scale empirical study characterizing physics failures in PE-based software. We investigate three research questions addressing the manifestations of physics failures, the effectiveness of detection techniques, and developer perceptions of current detection practices. Our contributions include: (1) a taxonomy of physics failure manifestations; (2) a comprehensive evaluation of detection methods including deep learning, prompt-based techniques, and large multimodal models; and (3) actionable insights from developer experiences for improving detection approaches. To support future research, we release PhysiXFails, code, and other materials at https://sites.google.com/view/physics-failure-detection.
Related papers
- PhysicsEval: Inference-Time Techniques to Improve the Reasoning Proficiency of Large Language Models on Physics Problems [3.0901186959880977]
We evaluate the performance of frontier LLMs in solving physics problems, both mathematical and descriptive.<n>We introduce a new evaluation benchmark for physics problems, $rm Psmall HYSICSEsmall VAL$, consisting of 19,609 problems sourced from various physics textbooks.
arXiv Detail & Related papers (2025-07-31T18:12:51Z) - PhysUniBench: An Undergraduate-Level Physics Reasoning Benchmark for Multimodal Models [69.73115077227969]
We present PhysUniBench, a large-scale benchmark designed to evaluate and improve the reasoning capabilities of large language models (MLLMs)<n>PhysUniBench consists of 3,304 physics questions spanning 8 major sub-disciplines of physics, each accompanied by one visual diagram.<n>The benchmark's construction involved a rigorous multi-stage process, including multiple roll-outs, expert-level evaluation, automated filtering of easily solved problems, and a nuanced difficulty grading system with five levels.
arXiv Detail & Related papers (2025-06-21T09:55:42Z) - Can Theoretical Physics Research Benefit from Language Agents? [50.57057488167844]
Large Language Models (LLMs) are rapidly advancing across diverse domains, yet their application in theoretical physics research is not yet mature.<n>This position paper argues that LLM agents can potentially help accelerate theoretical, computational, and applied physics when properly integrated with domain knowledge and toolbox.<n>We envision future physics-specialized LLMs that could handle multimodal data, propose testable hypotheses, and design experiments.
arXiv Detail & Related papers (2025-06-06T16:20:06Z) - Scaling Physical Reasoning with the PHYSICS Dataset [32.956687630330116]
PHYSICS is a dataset containing 16,568 high-quality physics problems spanning subjects and difficulty levels.<n>It covers five major physics domains: Mechanics, Electromagnetism, Thermodynamics, Optics, and Modern Physics.<n>It also spans a wide range of difficulty levels, from high school to graduate-level physics courses.
arXiv Detail & Related papers (2025-05-21T17:06:28Z) - Physics simulation capabilities of LLMs [0.0]
Large Language Models (LLMs) can solve some undergraduate-level to graduate-level physics textbook problems and are proficient at coding.
We present an evaluation of state-of-the-art (SOTA) LLMs on PhD-level to research-level computational physics problems.
arXiv Detail & Related papers (2023-12-04T18:06:41Z) - DeepSimHO: Stable Pose Estimation for Hand-Object Interaction via
Physics Simulation [81.11585774044848]
We present DeepSimHO, a novel deep-learning pipeline that combines forward physics simulation and backward gradient approximation with a neural network.
Our method noticeably improves the stability of the estimation and achieves superior efficiency over test-time optimization.
arXiv Detail & Related papers (2023-10-11T05:34:36Z) - Physics-Based Task Generation through Causal Sequence of Physical
Interactions [3.2244944291325996]
Performing tasks in a physical environment is a crucial yet challenging problem for AI systems operating in the real world.
We present a systematic approach for defining a physical scenario using a causal sequence of physical interactions between objects.
We then propose a methodology for generating tasks in a physics-simulating environment using defined scenarios as inputs.
arXiv Detail & Related papers (2023-08-05T10:15:18Z) - Physics-Guided Adversarial Machine Learning for Aircraft Systems
Simulation [9.978961706999833]
This work presents a novel approach, physics-guided adversarial machine learning (ML), that improves the confidence over the physics consistency of the model.
Empirical evaluation on two aircraft system performance models shows the effectiveness of our adversarial ML approach.
arXiv Detail & Related papers (2022-09-07T19:23:45Z) - Physics Embedded Machine Learning for Electromagnetic Data Imaging [83.27424953663986]
Electromagnetic (EM) imaging is widely applied in sensing for security, biomedicine, geophysics, and various industries.
It is an ill-posed inverse problem whose solution is usually computationally expensive. Machine learning (ML) techniques and especially deep learning (DL) show potential in fast and accurate imaging.
This article surveys various schemes to incorporate physics in learning-based EM imaging.
arXiv Detail & Related papers (2022-07-26T02:10:15Z) - Privacy-preserving machine learning with tensor networks [37.01494003138908]
We show that tensor network architectures have especially prospective properties for privacy-preserving machine learning.
First, we describe a new privacy vulnerability that is present in feedforward neural networks, illustrating it in synthetic and real-world datasets.
We rigorously prove that such conditions are satisfied by tensor-network architectures.
arXiv Detail & Related papers (2022-02-24T19:04:35Z) - PlasticineLab: A Soft-Body Manipulation Benchmark with Differentiable
Physics [89.81550748680245]
We introduce a new differentiable physics benchmark called PasticineLab.
In each task, the agent uses manipulators to deform the plasticine into the desired configuration.
We evaluate several existing reinforcement learning (RL) methods and gradient-based methods on this benchmark.
arXiv Detail & Related papers (2021-04-07T17:59:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.