Sci-VLA: Agentic VLA Inference Plugin for Long-Horizon Tasks in Scientific Experiments
- URL: http://arxiv.org/abs/2602.09430v1
- Date: Tue, 10 Feb 2026 05:50:19 GMT
- Title: Sci-VLA: Agentic VLA Inference Plugin for Long-Horizon Tasks in Scientific Experiments
- Authors: Yiwen Pang, Bo Zhou, Changjin Li, Xuanhao Wang, Shengxiang Xu, Deng-Bao Wang, Min-Ling Zhang, Shimin Di,
- Abstract summary: Recent vision-language-action models offer a promising foundation for robotic laboratories.<n>Experiments typically involve long-horizon tasks composed of multiple atomic tasks.<n>While VLA models fine-tuned for scientific tasks can reliably execute atomic experimental actions, they often fail to perform composite tasks formed by reordering and composing these known atomic actions.
- Score: 49.02509634515056
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Robotic laboratories play a critical role in autonomous scientific discovery by enabling scalable, continuous experimental execution. Recent vision-language-action (VLA) models offer a promising foundation for robotic laboratories. However, scientific experiments typically involve long-horizon tasks composed of multiple atomic tasks, posing a fundamental challenge to existing VLA models. While VLA models fine-tuned for scientific tasks can reliably execute atomic experimental actions seen during training, they often fail to perform composite tasks formed by reordering and composing these known atomic actions. This limitation arises from a distributional mismatch between training-time atomic tasks and inference-time composite tasks, which prevents VLA models from executing necessary transitional operations between atomic tasks. To address this challenge, we propose an Agentic VLA Inference Plugin for Long-Horizon Tasks in Scientific Experiments. It introduces an LLM-based agentic inference mechanism that intervenes when executing sequential manipulation tasks. By performing explicit transition inference and generating transitional robotic action code, the proposed plugin guides VLA models through missing transitional steps, enabling reliable execution of composite scientific workflows without any additional training. This inference-only intervention makes our method computationally efficient, data-efficient, and well-suited for open-ended and long-horizon robotic laboratory tasks. We build 3D assets of scientific instruments and common scientific operating scenes within an existing simulation environment. In these scenes, we have verified that our method increases the average success rate per atomic task by 42\% during inference. Furthermore, we show that our method can be easily transferred from the simulation to real scientific laboratories.
Related papers
- Grounding LLMs in Scientific Discovery via Embodied Actions [84.11877211907647]
Large Language Models (LLMs) have shown significant potential in scientific discovery but struggle to bridge the gap between theoretical reasoning and physical simulation.<n>We propose EmbodiedAct, a framework that transforms established scientific software into active embodied agents by groundings in embodied actions with a tight perception-execution loop.
arXiv Detail & Related papers (2026-02-24T07:37:18Z) - Bohrium + SciMaster: Building the Infrastructure and Ecosystem for Agentic Science at Scale [82.20980951765891]
We argue that scaling agentic science requires an infrastructure-and-ecosystem approach, instantiated Bohrium+SciMaster.<n>Bohrium acts as a managed, traceable hub for AI4S assets that turns diverse scientific data, software, compute, and laboratory systems into agent-ready capabilities.<n>SciMaster orchestrates these capabilities into long-horizon scientific, on which scientific agents can be composed and executed.
arXiv Detail & Related papers (2025-12-23T16:04:41Z) - ExpVid: A Benchmark for Experiment Video Understanding & Reasoning [65.17173232816818]
We introduce ExpVid, the first benchmark designed to systematically evaluate MLLMs on scientific experiment videos.<n>We evaluate 19 leading MLLMs on ExpVid and find that while they excel at coarse-grained recognition, they struggle with disambiguating fine details, tracking state changes over time, and linking experimental procedures to scientific outcomes.<n>Our results reveal a notable performance gap between proprietary and open-source models, particularly in high-order reasoning.
arXiv Detail & Related papers (2025-10-13T16:45:28Z) - LabUtopia: High-Fidelity Simulation and Hierarchical Benchmark for Scientific Embodied Agents [103.65422553044816]
LabUtopia is a comprehensive simulation and benchmarking suite designed to facilitate the development of generalizable, reasoning-capable embodied agents.<n>It supports 30 distinct tasks and includes more than 200 scene and instrument assets.<n>We demonstrate that LabUtopia offers a powerful platform for advancing the integration of perception, planning, and control in scientific-purpose agents.
arXiv Detail & Related papers (2025-05-28T17:50:53Z) - VISION: A Modular AI Assistant for Natural Human-Instrument Interaction at Scientific User Facilities [0.19736111241221438]
generative AI presents an opportunity to bridge this knowledge gap.<n>We present a modular architecture for the Virtual Scientific Companion (VISION)<n>With VISION, we performed LLM-based operation on the beamline workstation with low latency and demonstrated the first voice-controlled experiment at an X-ray scattering beamline.
arXiv Detail & Related papers (2024-12-24T04:37:07Z) - Autonomous Microscopy Experiments through Large Language Model Agents [4.241267255764773]
Large language models (LLMs) are revolutionizing self driving laboratories (SDLs) for materials research.<n>We introduce Artificially Intelligent Lab Assistant (AILA), a framework automating atomic force microscopy through LLM driven agents.<n>We find that state of the art models struggle with basic tasks and coordination scenarios.
arXiv Detail & Related papers (2024-12-18T09:35:28Z) - Agents for self-driving laboratories applied to quantum computing [2.840384720502993]
This paper introduces the k-agents framework, designed to support experimentalists in organizing laboratory knowledge and automating experiments with agents.<n>Our framework employs large language model-based agents to encapsulate laboratory knowledge including available laboratory operations and methods for analyzing experiment results.<n>To automate experiments, we introduce execution agents that break multi-step experimental procedures into agent-based state machines, interact with other agents to execute each step and analyze the experiment results.
arXiv Detail & Related papers (2024-12-10T23:30:44Z) - Large Language Models for Orchestrating Bimanual Robots [19.60907949776435]
We present LAnguage-model-based Bimanual ORchestration (LABOR) to analyze task configurations and devise coordination control policies.
We evaluate our method through simulated experiments involving two classes of long-horizon tasks using the NICOL humanoid robot.
arXiv Detail & Related papers (2024-04-02T15:08:35Z) - Pre-training Multi-task Contrastive Learning Models for Scientific
Literature Understanding [52.723297744257536]
Pre-trained language models (LMs) have shown effectiveness in scientific literature understanding tasks.
We propose a multi-task contrastive learning framework, SciMult, to facilitate common knowledge sharing across different literature understanding tasks.
arXiv Detail & Related papers (2023-05-23T16:47:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.