Understanding Physical Dynamics with Counterfactual World Modeling
- URL: http://arxiv.org/abs/2312.06721v3
- Date: Mon, 22 Jul 2024 07:51:25 GMT
- Title: Understanding Physical Dynamics with Counterfactual World Modeling
- Authors: Rahul Venkatesh, Honglin Chen, Kevin Feigelis, Daniel M. Bear, Khaled Jedoui, Klemen Kotar, Felix Binder, Wanhee Lee, Sherry Liu, Kevin A. Smith, Judith E. Fan, Daniel L. K. Yamins,
- Abstract summary: We use Counterfactual World Modeling (CWM) to extract vision structures for dynamics understanding.
CWM uses a temporally-factored masking policy for masked prediction of video data without annotations.
We demonstrate that these structures are useful for physical dynamics understanding, allowing CWM to achieve the state-of-the-art performance on the Physion benchmark.
- Score: 10.453874628135294
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The ability to understand physical dynamics is critical for agents to act in the world. Here, we use Counterfactual World Modeling (CWM) to extract vision structures for dynamics understanding. CWM uses a temporally-factored masking policy for masked prediction of video data without annotations. This policy enables highly effective "counterfactual prompting" of the predictor, allowing a spectrum of visual structures to be extracted from a single pre-trained predictor without finetuning on annotated datasets. We demonstrate that these structures are useful for physical dynamics understanding, allowing CWM to achieve the state-of-the-art performance on the Physion benchmark.
Related papers
- Learning Generalizable Visuomotor Policy through Dynamics-Alignment [13.655111993491674]
Recent approaches leveraging video prediction models have shown promising results by learning rich representations from large-scale datasets.<n>We propose a Dynamics-Aligned Flow Matching Policy (DAP) that integrates dynamics prediction into policy learning.<n>Our method introduces a novel architecture where policy and dynamics models provide mutual corrective feedback during action generation, enabling self-correction and improved generalization.
arXiv Detail & Related papers (2025-10-31T02:29:33Z) - A Time-Series Foundation Model by Universal Delay Embedding [4.221753069966852]
This study introduces Universal Delay Embedding (UDE), a pretrained foundation model designed to revolutionize time-series forecasting.<n>UDE as a dynamical representation of observed data constructs two-dimensional subspace patches from Hankel matrices.<n>In particular, the learned dynamical representations and Koopman operator prediction forms from the patches exhibit exceptional interpretability.
arXiv Detail & Related papers (2025-09-15T16:11:49Z) - VisionLaw: Inferring Interpretable Intrinsic Dynamics from Visual Observations via Bilevel Optimization [3.131272328696594]
VisionLaw is a bilevel optimization framework that infers interpretable expressions of intrinsic dynamics from visual observations.<n>It significantly outperforms existing state-of-the-art methods and exhibits strong generalization for interactive simulation in novel scenarios.
arXiv Detail & Related papers (2025-08-19T12:52:16Z) - Foundation Model for Skeleton-Based Human Action Understanding [56.89025287217221]
This paper presents a Unified Skeleton-based Dense Representation Learning framework.<n>USDRL consists of a Transformer-based Dense Spatio-Temporal (DSTE), Multi-Grained Feature Decorrelation (MG-FD), and Multi-Perspective Consistency Training (MPCT)
arXiv Detail & Related papers (2025-08-18T02:42:16Z) - MOOSE: Pay Attention to Temporal Dynamics for Video Understanding via Optical Flows [21.969862773424314]
MOOSE is a novel temporally-centric video encoder that integrates optical flow with spatial embeddings to model temporal information efficiently.<n>Unlike prior models, MOOSE takes advantage of rich, widely available pre-trained visual and optical flow encoders instead of training video models from scratch.
arXiv Detail & Related papers (2025-06-01T18:53:27Z) - Robust Multi-Modal Forecasting: Integrating Static and Dynamic Features [0.0]
Time series forecasting plays a crucial role in various applications, particularly in healthcare.<n> Ensuring transparency and explainability of the models responsible for these tasks is essential for their adoption in critical settings.<n>Recent work has explored a top-down approach to bi-level transparency, focusing on understanding trends and properties of predicted time series.
arXiv Detail & Related papers (2025-05-21T04:12:12Z) - Reinforcement Learning under Latent Dynamics: Toward Statistical and Algorithmic Modularity [51.40558987254471]
Real-world applications of reinforcement learning often involve environments where agents operate on complex, high-dimensional observations.
This paper addresses the question of reinforcement learning under $textitgeneral$ latent dynamics from a statistical and algorithmic perspective.
arXiv Detail & Related papers (2024-10-23T14:22:49Z) - Flex: End-to-End Text-Instructed Visual Navigation from Foundation Model Features [59.892436892964376]
We investigate the minimal data requirements and architectural adaptations necessary to achieve robust closed-loop performance with vision-based control policies.<n>Our findings are synthesized in Flex (Fly lexically), a framework that uses pre-trained Vision Language Models (VLMs) as frozen patch-wise feature extractors.<n>We demonstrate the effectiveness of this approach on a quadrotor fly-to-target task, where agents trained via behavior cloning successfully generalize to real-world scenes.
arXiv Detail & Related papers (2024-10-16T19:59:31Z) - Skeleton2vec: A Self-supervised Learning Framework with Contextualized
Target Representations for Skeleton Sequence [56.092059713922744]
We show that using high-level contextualized features as prediction targets can achieve superior performance.
Specifically, we propose Skeleton2vec, a simple and efficient self-supervised 3D action representation learning framework.
Our proposed Skeleton2vec outperforms previous methods and achieves state-of-the-art results.
arXiv Detail & Related papers (2024-01-01T12:08:35Z) - Finding emergence in data by maximizing effective information [2.1714094454496013]
It's crucial to develop a framework to identify emergent phenomena and capture emergent dynamics at the macro-level using available data.
Inspired by the theory of causal emergence (CE), this paper introduces a machine learning framework to learn macro-dynamics in an emergent latent space.
Experimental results on simulated and real data demonstrate the effectiveness of the proposed framework.
arXiv Detail & Related papers (2023-08-19T09:12:47Z) - Unifying (Machine) Vision via Counterfactual World Modeling [5.001446411351483]
We introduce Counterfactual World Modeling (CWM), a framework for constructing a visual foundation model.
CWM has two key components, which resolve the core issues that have hindered application of the foundation model concept to vision.
We show that CWM generates high-quality readouts on real-world images and videos for a diversity of tasks.
arXiv Detail & Related papers (2023-06-02T17:45:44Z) - EasyDGL: Encode, Train and Interpret for Continuous-time Dynamic Graph Learning [92.71579608528907]
This paper aims to design an easy-to-use pipeline (termed as EasyDGL) composed of three key modules with both strong ability fitting and interpretability.
EasyDGL can effectively quantify the predictive power of frequency content that a model learn from the evolving graph data.
arXiv Detail & Related papers (2023-03-22T06:35:08Z) - Predictive Experience Replay for Continual Visual Control and
Forecasting [62.06183102362871]
We present a new continual learning approach for visual dynamics modeling and explore its efficacy in visual control and forecasting.
We first propose the mixture world model that learns task-specific dynamics priors with a mixture of Gaussians, and then introduce a new training strategy to overcome catastrophic forgetting.
Our model remarkably outperforms the naive combinations of existing continual learning and visual RL algorithms on DeepMind Control and Meta-World benchmarks with continual visual control tasks.
arXiv Detail & Related papers (2023-03-12T05:08:03Z) - Neural Extended Kalman Filters for Learning and Predicting Dynamics of
Structural Systems [5.252966797394752]
We propose a learnable Extended Kalman Filter (EKF) for learning the latent evolution dynamics of complex physical systems.
Neural EKF is a generalized version of the conventional EKF, where the modeling of process dynamics and sensory observations can be parameterized by neural networks.
We show that the structure imposed by the Neural EKF is beneficial to the learning process.
arXiv Detail & Related papers (2022-10-09T04:39:15Z) - Physics-Inspired Temporal Learning of Quadrotor Dynamics for Accurate
Model Predictive Trajectory Tracking [76.27433308688592]
Accurately modeling quadrotor's system dynamics is critical for guaranteeing agile, safe, and stable navigation.
We present a novel Physics-Inspired Temporal Convolutional Network (PI-TCN) approach to learning quadrotor's system dynamics purely from robot experience.
Our approach combines the expressive power of sparse temporal convolutions and dense feed-forward connections to make accurate system predictions.
arXiv Detail & Related papers (2022-06-07T13:51:35Z) - Meta-learning using privileged information for dynamics [66.32254395574994]
We extend the Neural ODE Process model to use additional information within the Learning Using Privileged Information setting.
We validate our extension with experiments showing improved accuracy and calibration on simulated dynamics tasks.
arXiv Detail & Related papers (2021-04-29T12:18:02Z) - Deep learning of contagion dynamics on complex networks [0.0]
We propose a complementary approach based on deep learning to build effective models of contagion dynamics on networks.
By allowing simulations on arbitrary network structures, our approach makes it possible to explore the properties of the learned dynamics beyond the training data.
Our results demonstrate how deep learning offers a new and complementary perspective to build effective models of contagion dynamics on networks.
arXiv Detail & Related papers (2020-06-09T17:18:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.