Related papers: LLMPhy: Complex Physical Reasoning Using Large Language Models and World Models

LLMPhy: Complex Physical Reasoning Using Large Language Models and World Models

URL: http://arxiv.org/abs/2411.08027v1
Date: Tue, 12 Nov 2024 18:56:58 GMT
Title: LLMPhy: Complex Physical Reasoning Using Large Language Models and World Models
Authors: Anoop Cherian, Radu Corcodel, Siddarth Jain, Diego Romeres,
Abstract summary: We propose a new physical reasoning task and a dataset, dubbed TraySim. Our task involves predicting the dynamics of several objects on a tray that is given an external impact. We present LLMPhy, a zero-shot black-box optimization framework that leverages the physics knowledge and program synthesis abilities of LLMs. Our results show that the combination of the LLM and the physics engine leads to state-of-the-art zero-shot physical reasoning performance.
Score: 35.01842161084472
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Physical reasoning is an important skill needed for robotic agents when operating in the real world. However, solving such reasoning problems often involves hypothesizing and reflecting over complex multi-body interactions under the effect of a multitude of physical forces and thus learning all such interactions poses a significant hurdle for state-of-the-art machine learning frameworks, including large language models (LLMs). To study this problem, we propose a new physical reasoning task and a dataset, dubbed TraySim. Our task involves predicting the dynamics of several objects on a tray that is given an external impact -- the domino effect of the ensued object interactions and their dynamics thus offering a challenging yet controlled setup, with the goal of reasoning being to infer the stability of the objects after the impact. To solve this complex physical reasoning task, we present LLMPhy, a zero-shot black-box optimization framework that leverages the physics knowledge and program synthesis abilities of LLMs, and synergizes these abilities with the world models built into modern physics engines. Specifically, LLMPhy uses an LLM to generate code to iteratively estimate the physical hyperparameters of the system (friction, damping, layout, etc.) via an implicit analysis-by-synthesis approach using a (non-differentiable) simulator in the loop and uses the inferred parameters to imagine the dynamics of the scene towards solving the reasoning task. To show the effectiveness of LLMPhy, we present experiments on our TraySim dataset to predict the steady-state poses of the objects. Our results show that the combination of the LLM and the physics engine leads to state-of-the-art zero-shot physical reasoning performance, while demonstrating superior convergence against standard black-box optimization methods and better estimation of the physical parameters.

Related papers

EquiNO: A Physics-Informed Neural Operator for Multiscale Simulations [0.8345452787121658]
We propose EquiNO as a $textitcomplementary$ physics-informed PDE surrogate for predicting microscale physics. Our framework, applicable to the so-called multiscale FE$,2,$ computations, introduces the FE-OL approach by integrating the finite element (FE) method with operator learning (OL)
arXiv Detail & Related papers (2025-03-27T08:42:13Z)
Physics-Guided Foundation Model for Scientific Discovery: An Application to Aquatic Science [13.28811382673697]
We propose a textittextbfPhysics-textbfGuided textbfFoundation textbfModel (textbfPGFM) that combines pre-trained ML models and physics-based models. We demonstrate the effectiveness of this methodology in modeling water temperature and dissolved oxygen dynamics in real-world lakes.
arXiv Detail & Related papers (2025-02-10T00:48:10Z)
MAPS: Advancing Multi-Modal Reasoning in Expert-Level Physical Science [62.96434290874878]
Current Multi-Modal Large Language Models (MLLM) have shown strong capabilities in general visual reasoning tasks. We develop a new framework, named Multi-Modal Scientific Reasoning with Physics Perception and Simulation (MAPS) based on an MLLM. MAPS decomposes expert-level multi-modal reasoning task into physical diagram understanding via a Physical Perception Model (PPM) and reasoning with physical knowledge via a simulator.
arXiv Detail & Related papers (2025-01-18T13:54:00Z)
GausSim: Foreseeing Reality by Gaussian Simulator for Elastic Objects [55.02281855589641]
GausSim is a novel neural network-based simulator designed to capture the dynamic behaviors of real-world elastic objects represented through Gaussian kernels. We leverage continuum mechanics and treat each kernel as a Center of Mass System (CMS) that represents continuous piece of matter. In addition, GausSim incorporates explicit physics constraints, such as mass and momentum conservation, ensuring interpretable results and robust, physically plausible simulations.
arXiv Detail & Related papers (2024-12-23T18:58:17Z)
Physics Context Builders: A Modular Framework for Physical Reasoning in Vision-Language Models [9.474337395173388]
Physical reasoning remains a significant challenge for Vision-Language Models (VLMs) Fine-tuning is expensive for large models and impractical to repeatedly perform for every task. We introduce Physics Context Builders (PCBs), a novel modular framework where specialized VLMs are fine-tuned to generate detailed physical scene descriptions.
arXiv Detail & Related papers (2024-12-11T18:40:16Z)
Differentiable Physics-based System Identification for Robotic Manipulation of Elastoplastic Materials [43.99845081513279]
This work introduces a novel Differentiable Physics-based System Identification (DPSI) framework that enables a robot arm to infer the physics parameters of elastoplastic materials and the environment. With only a single real-world interaction, the estimated parameters can accurately simulate visually and physically realistic behaviours.
arXiv Detail & Related papers (2024-11-01T13:04:25Z)
Exploring Failure Cases in Multimodal Reasoning About Physical Dynamics [5.497036643694402]
We construct a simple simulated environment and demonstrate examples of where, in a zero-shot setting, both text and multimodal LLMs display atomic world knowledge about various objects but fail to compose this knowledge in correct solutions for an object manipulation and placement task. We also use BLIP, a vision-language model trained with more sophisticated cross-modal attention, to identify cases relevant to object physical properties that that model fails to ground.
arXiv Detail & Related papers (2024-02-24T00:01:01Z)
ContPhy: Continuum Physical Concept Learning and Reasoning from Videos [86.63174804149216]
ContPhy is a novel benchmark for assessing machine physical commonsense. We evaluated a range of AI models and found that they still struggle to achieve satisfactory performance on ContPhy. We also introduce an oracle model (ContPRO) that marries the particle-based physical dynamic models with the recent large language models.
arXiv Detail & Related papers (2024-02-09T01:09:21Z)
DeepSimHO: Stable Pose Estimation for Hand-Object Interaction via Physics Simulation [81.11585774044848]
We present DeepSimHO, a novel deep-learning pipeline that combines forward physics simulation and backward gradient approximation with a neural network. Our method noticeably improves the stability of the estimation and achieves superior efficiency over test-time optimization.
arXiv Detail & Related papers (2023-10-11T05:34:36Z)
UniQuadric: A SLAM Backend for Unknown Rigid Object 3D Tracking and Light-Weight Modeling [7.626461564400769]
We propose a novel SLAM backend that unifies ego-motion tracking, rigid object motion tracking, and modeling. Our system showcases the potential application of object perception in complex dynamic scenes.
arXiv Detail & Related papers (2023-09-29T07:50:09Z)
Physics-Based Task Generation through Causal Sequence of Physical Interactions [3.2244944291325996]
Performing tasks in a physical environment is a crucial yet challenging problem for AI systems operating in the real world. We present a systematic approach for defining a physical scenario using a causal sequence of physical interactions between objects. We then propose a methodology for generating tasks in a physics-simulating environment using defined scenarios as inputs.
arXiv Detail & Related papers (2023-08-05T10:15:18Z)
Nonprehensile Riemannian Motion Predictive Control [57.295751294224765]
We introduce a novel Real-to-Sim reward analysis technique to reliably imagine and predict the outcome of taking possible actions for a real robotic platform. We produce a closed-loop controller to reactively push objects in a continuous action space. We observe that RMPC is robust in cluttered as well as occluded environments and outperforms the baselines.
arXiv Detail & Related papers (2021-11-15T18:50:04Z)
PlasticineLab: A Soft-Body Manipulation Benchmark with Differentiable Physics [89.81550748680245]
We introduce a new differentiable physics benchmark called PasticineLab. In each task, the agent uses manipulators to deform the plasticine into the desired configuration. We evaluate several existing reinforcement learning (RL) methods and gradient-based methods on this benchmark.
arXiv Detail & Related papers (2021-04-07T17:59:23Z)
Physics-Integrated Variational Autoencoders for Robust and Interpretable Generative Modeling [86.9726984929758]
We focus on the integration of incomplete physics models into deep generative models. We propose a VAE architecture in which a part of the latent space is grounded by physics. We demonstrate generative performance improvements over a set of synthetic and real-world datasets.
arXiv Detail & Related papers (2021-02-25T20:28:52Z)
Scalable Differentiable Physics for Learning and Control [99.4302215142673]
Differentiable physics is a powerful approach to learning and control problems that involve physical objects and environments. We develop a scalable framework for differentiable physics that can support a large number of objects and their interactions.
arXiv Detail & Related papers (2020-07-04T19:07:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.