Re-opening open-source science through AI assisted development
- URL: http://arxiv.org/abs/2512.11993v1
- Date: Fri, 12 Dec 2025 19:16:53 GMT
- Title: Re-opening open-source science through AI assisted development
- Authors: Ling-Hong Hung, Ka Yee Yeung,
- Abstract summary: Open-source scientific software is effectively closed to modification by its complexity.<n>We demonstrate this with a case study, STAR-Flex, which is an open source fork of STAR, adding 16,000 lines of C++ code to process 10x Flex data.<n>This is the first open-source processing software for Flex data and was written as part of the NIH funded MorPHiC consortium.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Open-source scientific software is effectively closed to modification by its complexity. With recent advances in technology, an agentic AI team led by a single human can now rapidly and robustly modify large codebases and re-open science to the community which can review and vet the AI generated code. We demonstrate this with a case study, STAR-Flex, which is an open source fork of STAR, adding 16,000 lines of C++ code to add the ability to process 10x Flex data, while maintaining full original function. This is the first open-source processing software for Flex data and was written as part of the NIH funded MorPHiC consortium.
Related papers
- ThetaEvolve: Test-time Learning on Open Problems [110.5756538358217]
We introduce ThetaEvolve, an open-source framework that simplifies and extends AlphaEvolve to efficiently scale both in-context learning and Reinforcement Learning (RL) at test time.<n>We find that ThetaEvolve with RL at test-time consistently outperforms inference-only baselines.
arXiv Detail & Related papers (2025-11-28T18:58:14Z) - The Rise of AI Teammates in Software Engineering (SE) 3.0: How Autonomous Coding Agents Are Reshaping Software Engineering [10.252332355171237]
This paper introduces AIDev, the first largescale dataset capturing how such agents operate in the wild.<n>Spanning over 456,000 pull requests by five leading agents, AIDev provides an unprecedented empirical foundation for studying autonomous teammates in software development.<n>The dataset includes rich on PRs, authorship, review timelines, code changes, and integration outcomes.
arXiv Detail & Related papers (2025-07-20T15:15:58Z) - SciMaster: Towards General-Purpose Scientific AI Agents, Part I. X-Master as Foundation: Can We Lead on Humanity's Last Exam? [51.112225746095746]
We introduce X-Master, a tool-augmented reasoning agent designed to emulate human researchers.<n>X-Masters sets a new state-of-the-art record on Humanity's Last Exam with a score of 32.1%.
arXiv Detail & Related papers (2025-07-07T17:50:52Z) - AlphaEvolve: A coding agent for scientific and algorithmic discovery [63.13852052551106]
We present AlphaEvolve, an evolutionary coding agent that substantially enhances capabilities of state-of-the-art LLMs.<n>AlphaEvolve orchestrates an autonomous pipeline of LLMs, whose task is to improve an algorithm by making direct changes to the code.<n>We demonstrate the broad applicability of this approach by applying it to a number of important computational problems.
arXiv Detail & Related papers (2025-06-16T06:37:18Z) - Darwin Godel Machine: Open-Ended Evolution of Self-Improving Agents [32.42616663576657]
We introduce the Darwin G"odel Machine (DGM), a self-improving AI that repeatedly modifies itself in a provably beneficial manner.<n>Inspired by Darwinian evolution and open-endedness research, the DGM maintains an archive of generated coding agents.<n>It grows the archive by sampling an agent from it and using a foundation model to create a new, interesting, version of the sampled agent.
arXiv Detail & Related papers (2025-05-29T00:26:15Z) - OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models [61.14336781917986]
We introduce OpenR, an open-source framework for enhancing the reasoning capabilities of large language models (LLMs)
OpenR unifies data acquisition, reinforcement learning training, and non-autoregressive decoding into a cohesive software platform.
Our work is the first to provide an open-source framework that explores the core techniques of OpenAI's o1 model with reinforcement learning.
arXiv Detail & Related papers (2024-10-12T23:42:16Z) - OpenHands: An Open Platform for AI Software Developers as Generalist Agents [109.8507367518992]
We introduce OpenHands, a platform for the development of AI agents that interact with the world in similar ways to a human developer.<n>We describe how the platform allows for the implementation of new agents, safe interaction with sandboxed environments for code execution, and incorporation of evaluation benchmarks.
arXiv Detail & Related papers (2024-07-23T17:50:43Z) - h2oGPT: Democratizing Large Language Models [1.8043055303852882]
We introduce h2oGPT, a suite of open-source code repositories for the creation and use of Large Language Models.
The goal of this project is to create the world's best truly open-source alternative to closed-source approaches.
arXiv Detail & Related papers (2023-06-13T22:19:53Z) - StarCoder: may the source be with you! [79.93915935620798]
The BigCode community introduces StarCoder and StarCoderBase: 15.5B parameter models with 8K context length.
StarCoderBase is trained on 1 trillion tokens sourced from The Stack, a large collection of permissively licensed GitHub repositories.
arXiv Detail & Related papers (2023-05-09T08:16:42Z) - Data Engineering for Everyone [1.2585165426919136]
Data engineering is one of the fastest-growing fields within machine learning (ML)
ML requires more data than individual teams of data engineers can readily produce.
This article shows that open-source data sets are the rocket fuel for research and innovation at even some of the largest AI organizations.
arXiv Detail & Related papers (2021-02-23T01:24:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.