Exploring Code Comprehension in Scientific Programming: Preliminary Insights from Research Scientists
- URL: http://arxiv.org/abs/2501.10037v1
- Date: Fri, 17 Jan 2025 08:47:29 GMT
- Title: Exploring Code Comprehension in Scientific Programming: Preliminary Insights from Research Scientists
- Authors: Alyssia Chen, Carol Wong, Bonita Sharif, Anthony Peruma,
- Abstract summary: This study surveys 57 research scientists from various disciplines to explore their programming backgrounds, practices, and challenges they face regarding code readability.
Scientists mainly use Python and R, relying on documentation for readability.
Our findings show low adoption of code quality tools and a trend towards utilizing large language models to improve code quality.
- Score: 6.2329239454115415
- License:
- Abstract: Scientific software-defined as computer programs, scripts, or code used in scientific research, data analysis, modeling, or simulation-has become central to modern research. However, there is limited research on the readability and understandability of scientific code, both of which are vital for effective collaboration and reproducibility in scientific research. This study surveys 57 research scientists from various disciplines to explore their programming backgrounds, practices, and the challenges they face regarding code readability. Our findings reveal that most participants learn programming through self-study or on the-job training, with 57.9% lacking formal instruction in writing readable code. Scientists mainly use Python and R, relying on comments and documentation for readability. While most consider code readability essential for scientific reproducibility, they often face issues with inadequate documentation and poor naming conventions, with challenges including cryptic names and inconsistent conventions. Our findings also show low adoption of code quality tools and a trend towards utilizing large language models to improve code quality. These findings offer practical insights into enhancing coding practices and supporting sustainable development in scientific software.
Related papers
- DiSciPLE: Learning Interpretable Programs for Scientific Visual Discovery [61.02102713094486]
Good interpretation is important in scientific reasoning, as it allows for better decision-making.
This paper introduces an automatic way of obtaining such interpretable-by-design models, by learning programs that interleave neural networks.
We propose DiSciPLE an evolutionary algorithm that leverages common sense and prior knowledge of large language models (LLMs) to create Python programs explaining visual data.
arXiv Detail & Related papers (2025-02-14T10:26:14Z) - Many Heads Are Better Than One: Improved Scientific Idea Generation by A LLM-Based Multi-Agent System [62.832818186789545]
Virtual Scientists (VirSci) is a multi-agent system designed to mimic the teamwork inherent in scientific research.
VirSci organizes a team of agents to collaboratively generate, evaluate, and refine research ideas.
We show that this multi-agent approach outperforms the state-of-the-art method in producing novel scientific ideas.
arXiv Detail & Related papers (2024-10-12T07:16:22Z) - SciCode: A Research Coding Benchmark Curated by Scientists [37.900374175754465]
Since language models (LMs) now outperform average humans on many challenging tasks, it has become increasingly difficult to develop challenging, high-quality, and realistic evaluations.
We created a scientist-curated coding benchmark, SciCode, which includes problems in mathematics, physics, chemistry, biology, and materials science.
Claude3.5-Sonnet, the best-performing model among those tested, can solve only 4.6% of the problems in the most realistic setting.
arXiv Detail & Related papers (2024-07-18T05:15:24Z) - MASSW: A New Dataset and Benchmark Tasks for AI-Assisted Scientific Workflows [58.56005277371235]
We introduce MASSW, a comprehensive text dataset on Multi-Aspect Summarization of ScientificAspects.
MASSW includes more than 152,000 peer-reviewed publications from 17 leading computer science conferences spanning the past 50 years.
We demonstrate the utility of MASSW through multiple novel machine-learning tasks that can be benchmarked using this new dataset.
arXiv Detail & Related papers (2024-06-10T15:19:09Z) - How Far Have We Gone in Binary Code Understanding Using Large Language Models [51.527805834378974]
We propose a benchmark to evaluate the effectiveness of Large Language Models (LLMs) in binary code understanding.
Our evaluations reveal that existing LLMs can understand binary code to a certain extent, thereby improving the efficiency of binary code analysis.
arXiv Detail & Related papers (2024-04-15T14:44:08Z) - A Review of Neuroscience-Inspired Machine Learning [58.72729525961739]
Bio-plausible credit assignment is compatible with practically any learning condition and is energy-efficient.
In this paper, we survey several vital algorithms that model bio-plausible rules of credit assignment in artificial neural networks.
We conclude by discussing the future challenges that will need to be addressed in order to make such algorithms more useful in practical applications.
arXiv Detail & Related papers (2024-02-16T18:05:09Z) - Deep Learning for Code Intelligence: Survey, Benchmark and Toolkit [63.82016263181941]
Code intelligence leverages machine learning techniques to extract knowledge from extensive code corpora.
Currently, there is already a thriving research community focusing on code intelligence.
arXiv Detail & Related papers (2023-12-30T17:48:37Z) - Framework and Methodology for Verification of a Complex Scientific
Simulation Software, Flash-X [0.8437187555622163]
Computational science relies on scientific software as its primary instrument for scientific discovery.
Scientific software verification can be especially difficult, as users typically need to modify the software as part of a scientific study.
Here, we describe a methodology that we have developed for Flash-X, a community simulation software for multiple scientific domains.
arXiv Detail & Related papers (2023-08-30T17:57:37Z) - CLAIMED -- the open source framework for building coarse-grained
operators for accelerated discovery in science [0.0]
CLAIMED is a framework to build reusable operators and scalable scientific agnostic by supporting the scientist to draw from previous work by re-composing scientific operators.
CLAIMED is programming language, scientific library, and execution environment.
arXiv Detail & Related papers (2023-07-12T11:54:39Z) - Many bioinformatics programming tasks can be automated with ChatGPT [3.2698789104455677]
Recent advances in artificial intelligence have made it possible to translate human-language prompts to functional code.
We evaluated the extent to which one such model -- OpenAI's ChatGPT -- can successfully complete basic- to moderate-level programming tasks.
arXiv Detail & Related papers (2023-03-07T23:32:17Z) - Automated Creation and Human-assisted Curation of Computable Scientific
Models from Code and Text [2.3746609573239756]
Domain experts cannot gain a complete understanding of the implementation of a scientific model if they are not familiar with the code.
We develop a system for the automated creation and human-assisted curation of scientific models.
We present experimental results obtained using a dataset of code and associated text derived from NASA's Hypersonic Aerodynamics website.
arXiv Detail & Related papers (2022-01-28T17:31:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.