Related papers: Re-opening open-source science through AI assisted development

Re-opening open-source science through AI assisted development

URL: http://arxiv.org/abs/2512.11993v1
Date: Fri, 12 Dec 2025 19:16:53 GMT
Title: Re-opening open-source science through AI assisted development
Authors: Ling-Hong Hung, Ka Yee Yeung,
Abstract summary: Open-source scientific software is effectively closed to modification by its complexity.<n>We demonstrate this with a case study, STAR-Flex, which is an open source fork of STAR, adding 16,000 lines of C++ code to process 10x Flex data.<n>This is the first open-source processing software for Flex data and was written as part of the NIH funded MorPHiC consortium.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Open-source scientific software is effectively closed to modification by its complexity. With recent advances in technology, an agentic AI team led by a single human can now rapidly and robustly modify large codebases and re-open science to the community which can review and vet the AI generated code. We demonstrate this with a case study, STAR-Flex, which is an open source fork of STAR, adding 16,000 lines of C++ code to add the ability to process 10x Flex data, while maintaining full original function. This is the first open-source processing software for Flex data and was written as part of the NIH funded MorPHiC consortium.

Related papers

ThetaEvolve: Test-time Learning on Open Problems [110.5756538358217]
We introduce ThetaEvolve, an open-source framework that simplifies and extends AlphaEvolve to efficiently scale both in-context learning and Reinforcement Learning (RL) at test time.<n>We find that ThetaEvolve with RL at test-time consistently outperforms inference-only baselines.
arXiv Detail & Related papers (2025-11-28T18:58:14Z)
The Rise of AI Teammates in Software Engineering (SE) 3.0: How Autonomous Coding Agents Are Reshaping Software Engineering [10.252332355171237]
This paper introduces AIDev, the first largescale dataset capturing how such agents operate in the wild.<n>Spanning over 456,000 pull requests by five leading agents, AIDev provides an unprecedented empirical foundation for studying autonomous teammates in software development.<n>The dataset includes rich on PRs, authorship, review timelines, code changes, and integration outcomes.
arXiv Detail & Related papers (2025-07-20T15:15:58Z)
SciMaster: Towards General-Purpose Scientific AI Agents, Part I. X-Master as Foundation: Can We Lead on Humanity's Last Exam? [51.112225746095746]
We introduce X-Master, a tool-augmented reasoning agent designed to emulate human researchers.<n>X-Masters sets a new state-of-the-art record on Humanity's Last Exam with a score of 32.1%.
arXiv Detail & Related papers (2025-07-07T17:50:52Z)
AlphaEvolve: A coding agent for scientific and algorithmic discovery [63.13852052551106]
We present AlphaEvolve, an evolutionary coding agent that substantially enhances capabilities of state-of-the-art LLMs.<n>AlphaEvolve orchestrates an autonomous pipeline of LLMs, whose task is to improve an algorithm by making direct changes to the code.<n>We demonstrate the broad applicability of this approach by applying it to a number of important computational problems.
arXiv Detail & Related papers (2025-06-16T06:37:18Z)
Darwin Godel Machine: Open-Ended Evolution of Self-Improving Agents [32.42616663576657]
We introduce the Darwin G"odel Machine (DGM), a self-improving AI that repeatedly modifies itself in a provably beneficial manner.<n>Inspired by Darwinian evolution and open-endedness research, the DGM maintains an archive of generated coding agents.<n>It grows the archive by sampling an agent from it and using a foundation model to create a new, interesting, version of the sampled agent.
arXiv Detail & Related papers (2025-05-29T00:26:15Z)
OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models [61.14336781917986]
We introduce OpenR, an open-source framework for enhancing the reasoning capabilities of large language models (LLMs) OpenR unifies data acquisition, reinforcement learning training, and non-autoregressive decoding into a cohesive software platform. Our work is the first to provide an open-source framework that explores the core techniques of OpenAI's o1 model with reinforcement learning.
arXiv Detail & Related papers (2024-10-12T23:42:16Z)
OpenHands: An Open Platform for AI Software Developers as Generalist Agents [109.8507367518992]
We introduce OpenHands, a platform for the development of AI agents that interact with the world in similar ways to a human developer.<n>We describe how the platform allows for the implementation of new agents, safe interaction with sandboxed environments for code execution, and incorporation of evaluation benchmarks.
arXiv Detail & Related papers (2024-07-23T17:50:43Z)
h2oGPT: Democratizing Large Language Models [1.8043055303852882]
We introduce h2oGPT, a suite of open-source code repositories for the creation and use of Large Language Models. The goal of this project is to create the world's best truly open-source alternative to closed-source approaches.
arXiv Detail & Related papers (2023-06-13T22:19:53Z)
StarCoder: may the source be with you! [79.93915935620798]
The BigCode community introduces StarCoder and StarCoderBase: 15.5B parameter models with 8K context length. StarCoderBase is trained on 1 trillion tokens sourced from The Stack, a large collection of permissively licensed GitHub repositories.
arXiv Detail & Related papers (2023-05-09T08:16:42Z)
Data Engineering for Everyone [1.2585165426919136]
Data engineering is one of the fastest-growing fields within machine learning (ML) ML requires more data than individual teams of data engineers can readily produce. This article shows that open-source data sets are the rocket fuel for research and innovation at even some of the largest AI organizations.
arXiv Detail & Related papers (2021-02-23T01:24:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.