Agentic Design of Compositional Machines
- URL: http://arxiv.org/abs/2510.14980v2
- Date: Sun, 19 Oct 2025 05:35:03 GMT
- Title: Agentic Design of Compositional Machines
- Authors: Wenqian Zhang, Weiyang Liu, Zhen Liu,
- Abstract summary: We investigate whether large language models (LLMs) can learn to create machines.<n>We use BesiegeField, a testbed built on the machine-building game Besiege.<n>We benchmark state-of-the-art RLs with agentic and identify key capabilities required for success.
- Score: 26.167638081496914
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The design of complex machines stands as both a marker of human intelligence and a foundation of engineering practice. Given recent advances in large language models (LLMs), we ask whether they, too, can learn to create. We approach this question through the lens of compositional machine design: a task in which machines are assembled from standardized components to meet functional demands like locomotion or manipulation in a simulated physical environment. With this simplification, machine design is expressed as writing XML-like code that explicitly specifies pairwise part connections. To support this investigation, we introduce BesiegeField, a testbed built on the machine-building game Besiege, which enables part-based construction, physical simulation and reward-driven evaluation. Using BesiegeField, we benchmark state-of-the-art LLMs with agentic workflows and identify key capabilities required for success, including spatial reasoning, strategic assembly, and instruction-following. As current open-source models fall short, we explore reinforcement learning (RL) as a path to improvement: we curate a cold-start dataset, conduct RL finetuning experiments, and highlight open challenges at the intersection of language, machine design, and physical reasoning.
Related papers
- Towards Agentic Intelligence for Materials Science [73.4576385477731]
This survey advances a unique pipeline-centric view that spans from corpus curation and pretraining to goal-conditioned agents interfacing with simulation and experimental platforms.<n>To bridge communities and establish a shared frame of reference, we first present an integrated lens that aligns terminology, evaluation, and workflow stages across AI and materials science.
arXiv Detail & Related papers (2026-01-29T23:48:43Z) - Multi-Actor Generative Artificial Intelligence as a Game Engine [49.360775442760314]
Generative AI can be used in multi-actor environments with purposes ranging from social science modeling to interactive narrative and AI evaluation.<n>We argue here that a good approach is to take inspiration from tabletop role-playing games (TTRPGs), where a Game Master (GM) is responsible for the environment and generates all parts of the story not directly determined by the voluntary actions of player characters.
arXiv Detail & Related papers (2025-07-10T22:31:09Z) - ML For Hardware Design Interpretability: Challenges and Opportunities [3.3540424603831323]
We examine how design interpretability, particularly in RTL-to-NL tasks, influences the efficiency of the hardware design process.<n>We aim to guide future research in leveraging ML to automate RTL-to-NL tasks and improve hardware design interpretability.
arXiv Detail & Related papers (2025-04-11T03:47:51Z) - Large Language Models as Natural Selector for Embodied Soft Robot Design [5.023206838671049]
This paper introduces RoboCrafter-QA, a novel benchmark to evaluate whether Large Language Models can learn representations of soft robot designs.<n>Our experiments reveal that while these models exhibit promising capabilities in learning design representations, they struggle with fine-grained distinctions between designs with subtle performance differences.
arXiv Detail & Related papers (2025-03-04T03:55:10Z) - Specifications: The missing link to making the development of LLM systems an engineering discipline [65.10077876035417]
We discuss the progress the field has made so far-through advances like structured outputs, process supervision, and test-time compute.<n>We outline several future directions for research to enable the development of modular and reliable LLM-based systems.
arXiv Detail & Related papers (2024-11-25T07:48:31Z) - On the Exploration of LM-Based Soft Modular Robot Design [26.847859137653487]
Large language models (LLMs) have demonstrated promising capabilities in modeling real-world knowledge.
In this paper, we explore the potential of using LLMs to aid in the design of soft modular robots.
Our model performs well in evaluations for designing soft modular robots with uni- and bi-directional and stair-descending capabilities.
arXiv Detail & Related papers (2024-11-01T04:03:05Z) - DrEureka: Language Model Guided Sim-To-Real Transfer [64.14314476811806]
Transferring policies learned in simulation to the real world is a promising strategy for acquiring robot skills at scale.
In this paper, we investigate using Large Language Models (LLMs) to automate and accelerate sim-to-real design.
Our approach is capable of solving novel robot tasks, such as quadruped balancing and walking atop a yoga ball.
arXiv Detail & Related papers (2024-06-04T04:53:05Z) - Using the Abstract Computer Architecture Description Language to Model
AI Hardware Accelerators [77.89070422157178]
Manufacturers of AI-integrated products face a critical challenge: selecting an accelerator that aligns with their product's performance requirements.
The Abstract Computer Architecture Description Language (ACADL) is a concise formalization of computer architecture block diagrams.
In this paper, we demonstrate how to use the ACADL to model AI hardware accelerators, use their ACADL description to map DNNs onto them, and explain the timing simulation semantics to gather performance results.
arXiv Detail & Related papers (2024-01-30T19:27:16Z) - DEAP: Design Space Exploration for DNN Accelerator Parallelism [0.0]
Large Language Models (LLMs) are becoming increasingly complex and powerful to train and serve.
This paper showcases how hardware and software co-design can come together and allow us to create customized hardware systems.
arXiv Detail & Related papers (2023-12-24T02:43:01Z) - Toward General-Purpose Robots via Foundation Models: A Survey and Meta-Analysis [82.59451639072073]
General-purpose robots operate seamlessly in any environment, with any object, and utilize various skills to complete diverse tasks.
As a community, we have been constraining most robotic systems by designing them for specific tasks, training them on specific datasets, and deploying them within specific environments.
Motivated by the impressive open-set performance and content generation capabilities of web-scale, large-capacity pre-trained models, we devote this survey to exploring how foundation models can be applied to general-purpose robotics.
arXiv Detail & Related papers (2023-12-14T10:02:55Z) - Octopus: Embodied Vision-Language Programmer from Environmental Feedback [58.04529328728999]
Embodied vision-language models (VLMs) have achieved substantial progress in multimodal perception and reasoning.
To bridge this gap, we introduce Octopus, an embodied vision-language programmer that uses executable code generation as a medium to connect planning and manipulation.
Octopus is designed to 1) proficiently comprehend an agent's visual and textual task objectives, 2) formulate intricate action sequences, and 3) generate executable code.
arXiv Detail & Related papers (2023-10-12T17:59:58Z) - Designing Machine Learning Toolboxes: Concepts, Principles and Patterns [0.0]
We provide an overview of key patterns in the design of AI modeling toolboxes.
Our analysis can not only explain the design of existing toolboxes, but also guide the development of new ones.
arXiv Detail & Related papers (2021-01-13T08:55:15Z) - CausalWorld: A Robotic Manipulation Benchmark for Causal Structure and
Transfer Learning [138.40338621974954]
CausalWorld is a benchmark for causal structure and transfer learning in a robotic manipulation environment.
Tasks consist of constructing 3D shapes from a given set of blocks - inspired by how children learn to build complex structures.
arXiv Detail & Related papers (2020-10-08T23:01:13Z) - Integrated Benchmarking and Design for Reproducible and Accessible
Evaluation of Robotic Agents [61.36681529571202]
We describe a new concept for reproducible robotics research that integrates development and benchmarking.
One of the central components of this setup is the Duckietown Autolab, a standardized setup that is itself relatively low-cost and reproducible.
We validate the system by analyzing the repeatability of experiments conducted using the infrastructure and show that there is low variance across different robot hardware and across different remote labs.
arXiv Detail & Related papers (2020-09-09T15:31:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.