Related papers: The Station: An Open-World Environment for AI-Driven Discovery

The Station: An Open-World Environment for AI-Driven Discovery

URL: http://arxiv.org/abs/2511.06309v1
Date: Sun, 09 Nov 2025 10:13:00 GMT
Title: The Station: An Open-World Environment for AI-Driven Discovery
Authors: Stephen Chung, Wenyu Du,
Abstract summary: We introduce the STATION, an open-world multi-agent environment that models a miniature scientific ecosystem.<n>Agents in the Station can engage in long scientific journeys that include reading papers from peers, formulating hypotheses, submitting code, performing analyses, and publishing results.<n>Experiments demonstrate that AI agents in the Station achieve new state-of-the-art performance on a wide range of benchmarks, spanning from mathematics to computational biology to machine learning.
Score: 14.556758955830796
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: We introduce the STATION, an open-world multi-agent environment that models a miniature scientific ecosystem. Leveraging their extended context windows, agents in the Station can engage in long scientific journeys that include reading papers from peers, formulating hypotheses, submitting code, performing analyses, and publishing results. Importantly, there is no centralized system coordinating their activities - agents are free to choose their own actions and develop their own narratives within the Station. Experiments demonstrate that AI agents in the Station achieve new state-of-the-art performance on a wide range of benchmarks, spanning from mathematics to computational biology to machine learning, notably surpassing AlphaEvolve in circle packing. A rich tapestry of narratives emerges as agents pursue independent research, interact with peers, and build upon a cumulative history. From these emergent narratives, novel methods arise organically, such as a new density-adaptive algorithm for scRNA-seq batch integration. The Station marks a first step towards autonomous scientific discovery driven by emergent behavior in an open-world environment, representing a new paradigm that moves beyond rigid optimization.

Related papers

OdysseyArena: Benchmarking Large Language Models For Long-Horizon, Active and Inductive Interactions [66.84396313837765]
We introduce OdysseyArena, which re-centers agent evaluation on long-horizon, active, and inductive interactions.<n>We provide a set of 120 tasks to measure an agent's inductive efficiency and long-horizon discovery.<n>We also introduce OdysseyArena-Challenge to stress-test agent stability across extreme interaction horizons.
arXiv Detail & Related papers (2026-02-05T16:31:43Z)
Accelerating Scientific Research with Gemini: Case Studies and Common Techniques [105.15622072347811]
Large language models (LLMs) have opened new avenues for accelerating scientific research.<n>We present a collection of case studies demonstrating how researchers have successfully collaborated with advanced AI models.
arXiv Detail & Related papers (2026-02-03T18:56:17Z)
AI4X Roadmap: Artificial Intelligence for the advancement of scientific pursuit and its future directions [65.44445343399126]
We look at AI-enabled science across biology, chemistry, climate science, mathematics, materials science, physics, self-driving laboratories and unconventional computing.<n>Several shared themes emerge: the need for diverse and trustworthy data, transferable electronic-structure and interatomic models, AI systems integrated into end-to-end scientific synthesis.<n>Across domains, we highlight how large foundation models, active learning and self-driving laboratories can close loops between prediction and validation.
arXiv Detail & Related papers (2025-11-26T02:10:28Z)
Operationalizing Serendipity: Multi-Agent AI Workflows for Enhanced Materials Characterization with Theory-in-the-Loop [0.0]
SciLink is an open-source, multi-agent artificial intelligence framework designed to operationalize serendipity in materials research.<n>It creates a direct, automated link between experimental observation, novelty assessment, and theoretical simulations.<n>We show its application to atomic-resolution and hyperspectral data, its capacity to integrate real-time human expert guidance, and its ability to close the research loop.
arXiv Detail & Related papers (2025-08-07T04:59:17Z)
From Reproduction to Replication: Evaluating Research Agents with Progressive Code Masking [48.90371827091671]
AutoExperiment is a benchmark that evaluates AI agents' ability to implement and run machine learning experiments.<n>We evaluate state-of-the-art agents and find that performance degrades rapidly as $n$ increases.<n>Our findings highlight critical challenges in long-horizon code generation, context retrieval, and autonomous experiment execution.
arXiv Detail & Related papers (2025-06-24T15:39:20Z)
Spore in the Wild: A Case Study of Spore.fun as an Open-Environment Evolution Experiment with Sovereign AI Agents on TEE-Secured Blockchains [0.0]
Spore.fun is a real-world AI evolution experiment that enables autonomous breeding and evolution of new on-chain agents.<n>This paper presents a detailed case study of Spore.fun, examining agent behaviors and their evolutionary trajectories through digital ethology.
arXiv Detail & Related papers (2025-05-24T14:42:36Z)
From Automation to Autonomy: A Survey on Large Language Models in Scientific Discovery [67.07598263346591]
Large Language Models (LLMs) are catalyzing a paradigm shift in scientific discovery.<n>This survey systematically charts this burgeoning field, placing a central focus on the changing roles and escalating capabilities of LLMs in science.
arXiv Detail & Related papers (2025-05-19T15:41:32Z)
CodeScientist: End-to-End Semi-Automated Scientific Discovery with Code-based Experimentation [48.12054700748627]
We introduce CodeScientist, a novel ASD system that frames ideation and experiment construction as a form of genetic search jointly.<n>We use this paradigm to conduct hundreds of automated experiments on machine-generated ideas broadly in the domain of agents and virtual environments.
arXiv Detail & Related papers (2025-03-20T22:37:17Z)
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery [14.465756130099091]
This paper presents the first comprehensive framework for fully automatic scientific discovery. We introduce The AI Scientist, which generates novel research ideas, writes code, executes experiments, visualizes results, and describes its findings. In principle, this process can be repeated to iteratively develop ideas in an open-ended fashion, acting like the human scientific community.
arXiv Detail & Related papers (2024-08-12T16:58:11Z)
Discovering Sensorimotor Agency in Cellular Automata using Diversity Search [17.898087201326483]
In cellular automata (CA), a key open-question has been whether it is possible to find environment rules that self-organize. We show that this approach enables to find systematically environmental conditions in CA leading to self-organization. We show that the discovered agents have surprisingly robust capabilities to move, maintain their body integrity and navigate among various obstacles.
arXiv Detail & Related papers (2024-02-14T14:30:42Z)
DARLEI: Deep Accelerated Reinforcement Learning with Evolutionary Intelligence [77.78795329701367]
We present DARLEI, a framework that combines evolutionary algorithms with parallelized reinforcement learning. We characterize DARLEI's performance under various conditions, revealing factors impacting diversity of evolved morphologies. We hope to extend DARLEI in future work to include interactions between diverse morphologies in richer environments.
arXiv Detail & Related papers (2023-12-08T16:51:10Z)
Stochastic Coherence Over Attention Trajectory For Continuous Learning In Video Streams [64.82800502603138]
This paper proposes a novel neural-network-based approach to progressively and autonomously develop pixel-wise representations in a video stream. The proposed method is based on a human-like attention mechanism that allows the agent to learn by observing what is moving in the attended locations. Our experiments leverage 3D virtual environments and they show that the proposed agents can learn to distinguish objects just by observing the video stream.
arXiv Detail & Related papers (2022-04-26T09:52:31Z)
Deep active inference agents using Monte-Carlo methods [3.8233569758620054]
We present a neural architecture for building deep active inference agents in continuous state-spaces using Monte-Carlo sampling. Our approach enables agents to learn environmental dynamics efficiently, while maintaining task performance. Results show that deep active inference provides a flexible framework to develop biologically-inspired intelligent agents.
arXiv Detail & Related papers (2020-06-07T15:10:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.