Related papers: Learning Visual Planning Models from Partially Observed Images

Learning Visual Planning Models from Partially Observed Images

URL: http://arxiv.org/abs/2211.15666v1
Date: Fri, 25 Nov 2022 07:00:56 GMT
Title: Learning Visual Planning Models from Partially Observed Images
Authors: Kebing Jin, Zhanhao Xiao, Hankui Hankz Zhuo, Hai Wan, Jiaran Cai
Abstract summary: We provide a novel framework, aTypeRecplan, for learning a transition model from partially observed raw image traces. We also propose a neural-network-based approach to learn a model that estimates the distance toward a given goal observation. Our approach is more effective than a state-of-the-art approach of learning visual planning models in the environment with incomplete observations.
Score: 17.25694427734666
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: There has been increasing attention on planning model learning in classical planning. Most existing approaches, however, focus on learning planning models from structured data in symbolic representations. It is often difficult to obtain such structured data in real-world scenarios. Although a number of approaches have been developed for learning planning models from fully observed unstructured data (e.g., images), in many scenarios raw observations are often incomplete. In this paper, we provide a novel framework, \aType{Recplan}, for learning a transition model from partially observed raw image traces. More specifically, by considering the preceding and subsequent images in a trace, we learn the latent state representations of raw observations and then build a transition model based on such representations. Additionally, we propose a neural-network-based approach to learn a heuristic model that estimates the distance toward a given goal observation. Based on the learned transition model and heuristic model, we implement a classical planner for images. We exhibit empirically that our approach is more effective than a state-of-the-art approach of learning visual planning models in the environment with incomplete observations.

Related papers

Global and Local Entailment Learning for Natural World Imagery [7.874291189886743]
Radial Cross-Modal Embeddings (RCME) is a framework that enables the explicit modeling of transitivity-enforced entailment.<n>We develop a hierarchical vision-language foundation model capable of representing the hierarchy in the Tree of Life.
arXiv Detail & Related papers (2025-06-26T17:05:06Z)
Reinforcing Pre-trained Models Using Counterfactual Images [54.26310919385808]
This paper proposes a novel framework to reinforce classification models using language-guided generated counterfactual images. We identify model weaknesses by testing the model using the counterfactual image dataset. We employ the counterfactual images as an augmented dataset to fine-tune and reinforce the classification model.
arXiv Detail & Related papers (2024-06-19T08:07:14Z)
Sequential Modeling Enables Scalable Learning for Large Vision Models [120.91839619284431]
We introduce a novel sequential modeling approach which enables learning a Large Vision Model (LVM) without making use of any linguistic data. We define a common format, "visual sentences", in which we can represent raw images and videos as well as annotated data sources.
arXiv Detail & Related papers (2023-12-01T18:59:57Z)
Compositional Foundation Models for Hierarchical Planning [52.18904315515153]
We propose a foundation model which leverages expert foundation model trained on language, vision and action data individually together to solve long-horizon tasks. We use a large language model to construct symbolic plans that are grounded in the environment through a large video diffusion model. Generated video plans are then grounded to visual-motor control, through an inverse dynamics model that infers actions from generated videos.
arXiv Detail & Related papers (2023-09-15T17:44:05Z)
Planning with Diffusion for Flexible Behavior Synthesis [125.24438991142573]
We consider what it would look like to fold as much of the trajectory optimization pipeline as possible into the modeling problem. The core of our technical approach lies in a diffusion probabilistic model that plans by iteratively denoising trajectories.
arXiv Detail & Related papers (2022-05-20T07:02:03Z)
Visual Learning-based Planning for Continuous High-Dimensional POMDPs [81.16442127503517]
Visual Tree Search (VTS) is a learning and planning procedure that combines generative models learned offline with online model-based POMDP planning. VTS bridges offline model training and online planning by utilizing a set of deep generative observation models to predict and evaluate the likelihood of image observations in a Monte Carlo tree search planner. We show that VTS is robust to different observation noises and, since it utilizes online, model-based planning, can adapt to different reward structures without the need to re-train.
arXiv Detail & Related papers (2021-12-17T11:53:31Z)
Recommending Metamodel Concepts during Modeling Activities with Pre-Trained Language Models [0.0]
We propose an approach to assist a modeler in the design of a metamodel by recommending relevant domain concepts in several modeling scenarios. Our approach does not require to extract knowledge from the domain or to hand-design completion rules. We evaluate our approach on a test set containing 166 metamodels, unseen during the model training, with more than 5000 test samples.
arXiv Detail & Related papers (2021-04-04T16:29:10Z)
Forethought and Hindsight in Credit Assignment [62.05690959741223]
We work to understand the gains and peculiarities of planning employed as forethought via forward models or as hindsight operating with backward models. We investigate the best use of models in planning, primarily focusing on the selection of states in which predictions should be (re)-evaluated.
arXiv Detail & Related papers (2020-10-26T16:00:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.