CodeScientist: End-to-End Semi-Automated Scientific Discovery with Code-based Experimentation
- URL: http://arxiv.org/abs/2503.22708v1
- Date: Thu, 20 Mar 2025 22:37:17 GMT
- Title: CodeScientist: End-to-End Semi-Automated Scientific Discovery with Code-based Experimentation
- Authors: Peter Jansen, Oyvind Tafjord, Marissa Radensky, Pao Siangliulue, Tom Hope, Bhavana Dalvi Mishra, Bodhisattwa Prasad Majumder, Daniel S. Weld, Peter Clark,
- Abstract summary: We introduce CodeScientist, a novel ASD system that frames ideation and experiment construction as a form of genetic search jointly.<n>We use this paradigm to conduct hundreds of automated experiments on machine-generated ideas broadly in the domain of agents and virtual environments.
- Score: 48.12054700748627
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite the surge of interest in autonomous scientific discovery (ASD) of software artifacts (e.g., improved ML algorithms), current ASD systems face two key limitations: (1) they largely explore variants of existing codebases or similarly constrained design spaces, and (2) they produce large volumes of research artifacts (such as automatically generated papers and code) that are typically evaluated using conference-style paper review with limited evaluation of code. In this work we introduce CodeScientist, a novel ASD system that frames ideation and experiment construction as a form of genetic search jointly over combinations of research articles and codeblocks defining common actions in a domain (like prompting a language model). We use this paradigm to conduct hundreds of automated experiments on machine-generated ideas broadly in the domain of agents and virtual environments, with the system returning 19 discoveries, 6 of which were judged as being both at least minimally sound and incrementally novel after a multi-faceted evaluation beyond that typically conducted in prior work, including external (conference-style) review, code review, and replication attempts. Moreover, the discoveries span new tasks, agents, metrics, and data, suggesting a qualitative shift from benchmark optimization to broader discoveries.
Related papers
- ResearchCodeAgent: An LLM Multi-Agent System for Automated Codification of Research Methodologies [16.90884865239373]
We introduce ResearchCodeAgent, a novel multi-agent system to automate the codification of research methodologies.
The system bridges the gap between high-level research concepts and their practical implementation.
ResearchCodeAgent represents a significant step towards the research implementation process, potentially accelerating the pace of machine learning research.
arXiv Detail & Related papers (2025-04-28T07:18:45Z) - IRIS: Interactive Research Ideation System for Accelerating Scientific Discovery [27.218896203253987]
IRIS is an open-source platform designed for researchers to leverage large language models (LLMs)-assisted scientific ideation.
IRIS incorporates innovative features to enhance ideation, including adaptive test-time compute expansion via Monte Carlo Tree Search (MCTS), fine-grained feedback mechanism, and query-based literature synthesis.
We conduct a user study with researchers across diverse disciplines, validating the effectiveness of our system in enhancing ideation.
arXiv Detail & Related papers (2025-04-23T14:01:36Z) - The AI Cosmologist I: An Agentic System for Automated Data Analysis [0.0]
The AI Cosmologist implements a complete pipeline from idea generation to experimental evaluation and research dissemination.
Unlike traditional auto machine-learning systems, the AI Cosmologist generates diverse implementation strategies.
Results indicate that agentic systems can automate portions of the research process, potentially accelerating scientific discovery.
arXiv Detail & Related papers (2025-04-04T13:12:08Z) - Large Language Models for Scholarly Ontology Generation: An Extensive Analysis in the Engineering Field [0.0]
This paper offers an analysis of the ability of large models to identify semantic relationships between different research topics.<n>We developed a gold standard based on the IEEE Thesaurus to evaluate the task.<n>Several models have achieved outstanding results, including Mixtral-8x7B, Dolphin-Mistral, and Claude 3-7B.
arXiv Detail & Related papers (2024-12-11T10:11:41Z) - Many Heads Are Better Than One: Improved Scientific Idea Generation by A LLM-Based Multi-Agent System [62.832818186789545]
Virtual Scientists (VirSci) is a multi-agent system designed to mimic the teamwork inherent in scientific research.
VirSci organizes a team of agents to collaboratively generate, evaluate, and refine research ideas.
We show that this multi-agent approach outperforms the state-of-the-art method in producing novel scientific ideas.
arXiv Detail & Related papers (2024-10-12T07:16:22Z) - Fast and Slow Generating: An Empirical Study on Large and Small Language Models Collaborative Decoding [27.004817441034795]
Collaborative decoding between large and small language models (SLMs) presents a promising strategy to mitigate these issues.
Inspired by dual-process cognitive theory, we propose a unified framework, termed Fast and Slow Generating (FS-GEN)
Within this framework, LLMs are categorized as System 2 (slow and deliberate), while independent SLMs are designated as System 1.
arXiv Detail & Related papers (2024-06-18T05:59:28Z) - DISCOVERYWORLD: A Virtual Environment for Developing and Evaluating Automated Scientific Discovery Agents [49.74065769505137]
We introduce DISCOVERYWORLD, the first virtual environment for developing and benchmarking an agent's ability to perform complete cycles of novel scientific discovery.
It includes 120 different challenge tasks spanning eight topics each with three levels of difficulty and several parametric variations.
We find that strong baseline agents, that perform well in prior published environments, struggle on most DISCOVERYWORLD tasks.
arXiv Detail & Related papers (2024-06-10T20:08:44Z) - BEACON: A Bayesian Optimization Strategy for Novelty Search in Expensive Black-Box Systems [1.204357447396532]
Novelty search (NS) refers to a class of exploration algorithms that automatically uncover diverse system behaviors through simulations or experiments.<n>We propose a sample-efficient NS method inspired by Bayesian optimization principles.<n>We show that BEACON comprehensively outperforms existing baselines by finding substantially larger sets of diverse behaviors under limited sampling budgets.
arXiv Detail & Related papers (2024-06-05T20:23:52Z) - ResearchAgent: Iterative Research Idea Generation over Scientific Literature with Large Language Models [56.08917291606421]
ResearchAgent is an AI-based system for ideation and operationalization of novel work.
ResearchAgent automatically defines novel problems, proposes methods and designs experiments, while iteratively refining them.
We experimentally validate our ResearchAgent on scientific publications across multiple disciplines.
arXiv Detail & Related papers (2024-04-11T13:36:29Z) - Large Language Models for Automated Open-domain Scientific Hypotheses Discovery [50.40483334131271]
This work proposes the first dataset for social science academic hypotheses discovery.
Unlike previous settings, the new dataset requires (1) using open-domain data (raw web corpus) as observations; and (2) proposing hypotheses even new to humanity.
A multi- module framework is developed for the task, including three different feedback mechanisms to boost performance.
arXiv Detail & Related papers (2023-09-06T05:19:41Z) - SIERRA: A Modular Framework for Research Automation and Reproducibility [6.1678491628787455]
We present SIERRA, a novel framework for accelerating research development and improving results.
SIERRA accelerates research by automating the process of generating executable experiments from queries over independent variables.
It employs a modular architecture enabling easy customization and extension for the needs of individual researchers.
arXiv Detail & Related papers (2022-08-16T15:36:34Z) - Few-Shot Named Entity Recognition: A Comprehensive Study [92.40991050806544]
We investigate three schemes to improve the model generalization ability for few-shot settings.
We perform empirical comparisons on 10 public NER datasets with various proportions of labeled data.
We create new state-of-the-art results on both few-shot and training-free settings.
arXiv Detail & Related papers (2020-12-29T23:43:16Z) - Memetic Search for Vehicle Routing with Simultaneous Pickup-Delivery and
Time Windows [31.512563458410963]
We propose a novel Memetic Algorithm with efficient local search and extended neighborhood, dubbed MATE, to solve this problem.
MATE outperforms all the state-of-the-art algorithms, and notably, finds new best-known solutions on 12 instances (65 instances in total)
arXiv Detail & Related papers (2020-11-12T12:06:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.