From Articles to Code: On-Demand Generation of Core Algorithms from Scientific Publications
- URL: http://arxiv.org/abs/2507.22324v1
- Date: Wed, 30 Jul 2025 01:52:01 GMT
- Title: From Articles to Code: On-Demand Generation of Core Algorithms from Scientific Publications
- Authors: Cameron S. Movassaghi, Amanda Momenzadeh, Jesse G. Meyer,
- Abstract summary: We show that rich method descriptions in scientific publications can serve as standalone specifications for modern large language models.<n>We benchmark state-of-the-art models by tasking them with implementing a diverse set of core algorithms drawn from original publications.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Maintaining software packages imposes significant costs due to dependency management, bug fixes, and versioning. We show that rich method descriptions in scientific publications can serve as standalone specifications for modern large language models (LLMs), enabling on-demand code generation that could supplant human-maintained libraries. We benchmark state-of-the-art models (GPT-o4-mini-high, Gemini Pro 2.5, Claude Sonnet 4) by tasking them with implementing a diverse set of core algorithms drawn from original publications. Our results demonstrate that current LLMs can reliably reproduce package functionality with performance indistinguishable from conventional libraries. These findings foreshadow a paradigm shift toward flexible, on-demand code generation and away from static, human-maintained packages, which will result in reduced maintenance overhead by leveraging published articles as sufficient context for the automated implementation of analytical workflows.
Related papers
- Code Fingerprints: Disentangled Attribution of LLM-Generated Code [7.515488307576106]
We study the problem of model-level code attribution, which aims to determine the source LLM responsible for generated code.<n>We propose the Disentangled Code Attribution Network (DCAN), which separates Source-Agnostic semantic information from Source-Specific stylistic representations.<n>We construct the first large-scale benchmark dataset comprising code generated by four widely used Large Language Models (LLMs) across four programming languages.
arXiv Detail & Related papers (2026-03-04T15:58:36Z) - Can Large Language Models Implement Agent-Based Models? An ODD-based Replication Study [0.6821122205224714]
Large language models (LLMs) can now synthesize non-trivial executable code from textual descriptions.<n>Can LLMs reliably implement agent-based models from standardized specifications in a way that supports replication, verification, and validation?<n>We evaluate 17 contemporary LLMs on a controlled ODD-to-code translation task.
arXiv Detail & Related papers (2026-02-08T19:56:20Z) - Cost-Aware Model Selection for Text Classification: Multi-Objective Trade-offs Between Fine-Tuned Encoders and LLM Prompting in Production [0.0]
Large language models (LLMs) have demonstrated strong capabilities in open-ended reasoning and generative language tasks.<n>For structured text classification problems with fixed label spaces, model selection is often driven by predictive performance alone.<n>We show that fine-tuned encoder-based models from the BERT family achieve competitive, and often superior, classification performance.
arXiv Detail & Related papers (2026-02-06T03:54:28Z) - CURP: Codebook-based Continuous User Representation for Personalized Generation with LLMs [60.867541073274715]
We propose a novel framework CURP, which employs a bidirectional user encoder and a discrete prototype codebook to extract multi-dimensional user traits.<n>This design enables plug-and-play personalization with a small number of trainable parameters.<n>We show that CURP achieves superior performance and generalization compared to strong baselines.
arXiv Detail & Related papers (2026-01-31T14:13:06Z) - GPT-4.1 Sets the Standard in Automated Experiment Design Using Novel Python Libraries [0.7905066238005297]
Large Language Models (LLMs) have advanced rapidly as tools for automating code generation in scientific research.<n>This study systematically benchmarks a selection of state-of-the-art LLMs in generating functional Python code for two increasingly challenging scenarios.
arXiv Detail & Related papers (2025-07-30T13:11:29Z) - Evaluating Large Language Models on Non-Code Software Engineering Tasks [4.381476817430934]
Large Language Models (LLMs) have demonstrated remarkable capabilities in code understanding and generation.<n>We present the first comprehensive benchmark, which we name Software Engineering Language Understanding' (SELU)<n>SELU covers classification, regression, Named Entity Recognition (NER) and Masked Language Modeling (MLM) targets, with data drawn from diverse sources.
arXiv Detail & Related papers (2025-06-12T15:52:32Z) - Training Language Models to Generate Quality Code with Program Analysis Feedback [66.0854002147103]
Code generation with large language models (LLMs) is increasingly adopted in production but fails to ensure code quality.<n>We propose REAL, a reinforcement learning framework that incentivizes LLMs to generate production-quality code.
arXiv Detail & Related papers (2025-05-28T17:57:47Z) - Code Graph Model (CGM): A Graph-Integrated Large Language Model for Repository-Level Software Engineering Tasks [42.79558714652442]
Large Language Models (LLMs) have shown promise in function-level code generation, yet repository-level software engineering tasks remain challenging.<n>This paper investigates whether open-source LLMs can effectively address repository-level tasks without requiring agent-based approaches.<n>We introduce Code Graph Models (CGMs), which integrate repository code graph structures into the LLM's attention mechanism.
arXiv Detail & Related papers (2025-05-22T17:00:55Z) - Syntactic and Semantic Control of Large Language Models via Sequential Monte Carlo [90.78001821963008]
A wide range of LM applications require generating text that conforms to syntactic or semantic constraints.<n>We develop an architecture for controlled LM generation based on sequential Monte Carlo (SMC)<n>Our system builds on the framework of Lew et al. (2023) and integrates with its language model probabilistic programming language.
arXiv Detail & Related papers (2025-04-17T17:49:40Z) - Model Utility Law: Evaluating LLMs beyond Performance through Mechanism Interpretable Metric [99.56567010306807]
Large Language Models (LLMs) have become indispensable across academia, industry, and daily applications.<n>One core challenge of evaluation in the large language model (LLM) era is the generalization issue.<n>We propose Model Utilization Index (MUI), a mechanism interpretability enhanced metric that complements traditional performance scores.
arXiv Detail & Related papers (2025-04-10T04:09:47Z) - Thinking Longer, Not Larger: Enhancing Software Engineering Agents via Scaling Test-Time Compute [61.00662702026523]
We propose a unified Test-Time Compute scaling framework that leverages increased inference-time instead of larger models.<n>Our framework incorporates two complementary strategies: internal TTC and external TTC.<n>We demonstrate our textbf32B model achieves a 46% issue resolution rate, surpassing significantly larger models such as DeepSeek R1 671B and OpenAI o1.
arXiv Detail & Related papers (2025-03-31T07:31:32Z) - Citegeist: Automated Generation of Related Work Analysis on the arXiv Corpus [0.0]
We present Citegeist: An application pipeline using dynamic Retrieval Augmented Generation (RAG) on the arXiv Corpus.<n>For this purpose, we employ a mixture of embedding-based similarity matching, summarization, and multi-stage filtering.<n>To adapt to the continuous growth of the document base, we also present an optimized way of incorporating new and modified papers.
arXiv Detail & Related papers (2025-03-29T21:19:43Z) - PennyLang: Pioneering LLM-Based Quantum Code Generation with a Novel PennyLane-Centric Dataset [4.826802034066811]
Large Language Models (LLMs) offer remarkable capabilities in code generation, natural language processing, and domain-specific reasoning.<n>We introduce a novel, high-quality dataset comprising 3,347 PennyLane-specific quantum code samples and contextual descriptions.
arXiv Detail & Related papers (2025-03-04T11:04:35Z) - OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models [76.59316249991657]
Large language models (LLMs) for code have become indispensable in various domains, including code generation, reasoning tasks and agent systems.<n>While open-access code LLMs are increasingly approaching the performance levels of proprietary models, high-quality code LLMs remain limited.<n>We introduce OpenCoder, a top-tier code LLM that not only achieves performance comparable to leading models but also serves as an "open cookbook" for the research community.
arXiv Detail & Related papers (2024-11-07T17:47:25Z) - AXOLOTL: Fairness through Assisted Self-Debiasing of Large Language
Model Outputs [20.772266479533776]
AXOLOTL is a novel post-processing framework that operates agnostically across tasks and models.
It identifies biases, proposes resolutions, and guides the model to self-debias its outputs.
This approach minimizes computational costs and preserves model performance.
arXiv Detail & Related papers (2024-03-01T00:02:37Z) - Cheaply Evaluating Inference Efficiency Metrics for Autoregressive
Transformer APIs [66.30706841821123]
Large language models (LLMs) power many state-of-the-art systems in natural language processing.
LLMs are extremely computationally expensive, even at inference time.
We propose a new metric for comparing inference efficiency across models.
arXiv Detail & Related papers (2023-05-03T21:51:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.