SimStep: Chain-of-Abstractions for Incremental Specification and Debugging of AI-Generated Interactive Simulations
- URL: http://arxiv.org/abs/2507.09664v1
- Date: Sun, 13 Jul 2025 14:54:17 GMT
- Title: SimStep: Chain-of-Abstractions for Incremental Specification and Debugging of AI-Generated Interactive Simulations
- Authors: Zoe Kaputa, Anika Rajaram, Vryan Almanon Feliciano, Zhuoyue Lyu, Maneesh Agrawala, Hari Subramonyam,
- Abstract summary: Chain-of-Abstractions (CoA) is a way to recover programming's core affordances.<n>CoA decomposes the synthesis process into a sequence of cognitively meaningful, task-aligned representations.<n>SimStep is an authoring environment for teachers that scaffolds simulation creation through four intermediate abstractions.
- Score: 16.00479720281197
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Programming-by-prompting with generative AI offers a new paradigm for end-user programming, shifting the focus from syntactic fluency to semantic intent. This shift holds particular promise for non-programmers such as educators, who can describe instructional goals in natural language to generate interactive learning content. Yet in bypassing direct code authoring, many of programming's core affordances - such as traceability, stepwise refinement, and behavioral testing - are lost. We propose the Chain-of-Abstractions (CoA) framework as a way to recover these affordances while preserving the expressive flexibility of natural language. CoA decomposes the synthesis process into a sequence of cognitively meaningful, task-aligned representations that function as checkpoints for specification, inspection, and refinement. We instantiate this approach in SimStep, an authoring environment for teachers that scaffolds simulation creation through four intermediate abstractions: Concept Graph, Scenario Graph, Learning Goal Graph, and UI Interaction Graph. To address ambiguities and misalignments, SimStep includes an inverse correction process that surfaces in-filled model assumptions and enables targeted revision without requiring users to manipulate code. Evaluations with educators show that CoA enables greater authoring control and interpretability in programming-by-prompting workflows.
Related papers
- CodeDiffuser: Attention-Enhanced Diffusion Policy via VLM-Generated Code for Instruction Ambiguity [23.77040677368575]
We introduce a novel robotic manipulation framework that can accomplish tasks specified by potentially ambiguous natural language.<n>This framework employs a Vision-Language Model (VLM) to interpret abstract concepts in natural language instructions.<n>We show that our approach excels across challenging manipulation tasks involving language ambiguity, contact-rich manipulation, and multi-object interactions.
arXiv Detail & Related papers (2025-06-19T23:42:03Z) - Learning Task Representations from In-Context Learning [73.72066284711462]
Large language models (LLMs) have demonstrated remarkable proficiency in in-context learning.<n>We introduce an automated formulation for encoding task information in ICL prompts as a function of attention heads.<n>We show that our method's effectiveness stems from aligning the distribution of the last hidden state with that of an optimally performing in-context-learned model.
arXiv Detail & Related papers (2025-02-08T00:16:44Z) - Large Language Models as Realistic Microservice Trace Generators [54.85489678342595]
Workload traces are essential to understand complex computer systems' behavior and manage processing and memory resources.<n>This paper proposes a first-of-a-kind approach that relies on training a large language model to generate synthetic workload traces.<n>Our model adapts to downstream trace-related tasks, such as predicting key trace features and infilling missing data.
arXiv Detail & Related papers (2024-12-16T12:48:04Z) - Instance-Aware Graph Prompt Learning [71.26108600288308]
We introduce Instance-Aware Graph Prompt Learning (IA-GPL) in this paper.
The process involves generating intermediate prompts for each instance using a lightweight architecture.
Experiments conducted on multiple datasets and settings showcase the superior performance of IA-GPL compared to state-of-the-art baselines.
arXiv Detail & Related papers (2024-11-26T18:38:38Z) - Flex: End-to-End Text-Instructed Visual Navigation from Foundation Model Features [59.892436892964376]
We investigate the minimal data requirements and architectural adaptations necessary to achieve robust closed-loop performance with vision-based control policies.<n>Our findings are synthesized in Flex (Fly lexically), a framework that uses pre-trained Vision Language Models (VLMs) as frozen patch-wise feature extractors.<n>We demonstrate the effectiveness of this approach on a quadrotor fly-to-target task, where agents trained via behavior cloning successfully generalize to real-world scenes.
arXiv Detail & Related papers (2024-10-16T19:59:31Z) - SGC-VQGAN: Towards Complex Scene Representation via Semantic Guided Clustering Codebook [9.993066868670283]
We introduce SGC-VQGAN through Semantic Online Clustering method to enhance token semantics through Consistent Semantic Learning.
Our approach constructs a temporospatially consistent semantic codebook, addressing issues of codebook collapse and imbalanced token semantics.
arXiv Detail & Related papers (2024-09-09T23:12:43Z) - LangSuitE: Planning, Controlling and Interacting with Large Language Models in Embodied Text Environments [70.91258869156353]
We introduce LangSuitE, a versatile and simulation-free testbed featuring 6 representative embodied tasks in textual embodied worlds.
Compared with previous LLM-based testbeds, LangSuitE offers adaptability to diverse environments without multiple simulation engines.
We devise a novel chain-of-thought (CoT) schema, EmMem, which summarizes embodied states w.r.t. history information.
arXiv Detail & Related papers (2024-06-24T03:36:29Z) - SymbolicAI: A framework for logic-based approaches combining generative models and solvers [9.841285581456722]
We introduce SymbolicAI, a versatile and modular framework employing a logic-based approach to concept learning and flow management in generative processes.
We treat large language models (LLMs) as semantic solvers that execute tasks based on both natural and formal language instructions.
arXiv Detail & Related papers (2024-02-01T18:50:50Z) - AI Chain on Large Language Model for Unsupervised Control Flow Graph
Generation for Statically-Typed Partial Code [21.423928174875844]
Control Flow Graphs (CFGs) are essential for visualizing, understanding and analyzing program behavior.
We propose a novel approach that leverages the error-tolerant and understanding ability of pre-trained Large Language Models (LLMs) to generate CFGs.
arXiv Detail & Related papers (2023-06-01T14:52:59Z) - Contrastive Language, Action, and State Pre-training for Robot Learning [1.1000499414131326]
We introduce a method for unifying language, action, and state information in a shared embedding space to facilitate a range of downstream tasks in robot learning.
Our method, Contrastive Language, Action, and State Pre-training (CLASP), extends the CLIP formulation by incorporating distributional learning, capturing the inherent complexities and one-to-many relationships in behaviour-text alignment.
We demonstrate the utility of our method for the following downstream tasks: zero-shot text-behaviour retrieval, captioning unseen robot behaviours, and learning a behaviour prior to language-conditioned reinforcement learning.
arXiv Detail & Related papers (2023-04-21T07:19:33Z) - Shepherd Pre-trained Language Models to Develop a Train of Thought: An
Iterative Prompting Approach [30.117038793151004]
Pre-trained Language Models (PLMs) have been shown incapable of recalling knowledge to solve tasks requiring complex & multi-step inference procedures.
Similar to how humans develop a "train of thought" for these tasks, how can we equip PLMs with such abilities?
We propose an iterative context-aware prompter, which addresses these limitations by learning to dynamically synthesize conditioned prompts on the current step's contexts.
arXiv Detail & Related papers (2022-03-16T04:12:20Z) - Learning to Ask Conversational Questions by Optimizing Levenshtein
Distance [83.53855889592734]
We introduce a Reinforcement Iterative Sequence Editing (RISE) framework that optimize the minimum Levenshtein distance (MLD) through explicit editing actions.
RISE is able to pay attention to tokens that are related to conversational characteristics.
Experimental results on two benchmark datasets show that RISE significantly outperforms state-of-the-art methods.
arXiv Detail & Related papers (2021-06-30T08:44:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.