Related papers: How do Machine Learning Projects use Continuous Integration Practices? An Empirical Study on GitHub Actions

How do Machine Learning Projects use Continuous Integration Practices? An Empirical Study on GitHub Actions

URL: http://arxiv.org/abs/2403.09547v1
Date: Thu, 14 Mar 2024 16:35:39 GMT
Title: How do Machine Learning Projects use Continuous Integration Practices? An Empirical Study on GitHub Actions
Authors: João Helis Bernardo, Daniel Alencar da Costa, Sérgio Queiroz de Medeiros, Uirá Kulesza,
Abstract summary: We conduct a comprehensive analysis of 185 open-source projects on GitHub (93 ML and 92 non-ML projects) Our investigation comprises both quantitative and qualitative dimensions, aiming to uncover differences in CI adoption between ML and non-ML projects. Our findings indicate that ML projects often require longer build durations, and medium-sized ML projects exhibit lower test coverage compared to non-ML projects.
Score: 1.5197353881052764
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Continuous Integration (CI) is a well-established practice in traditional software development, but its nuances in the domain of Machine Learning (ML) projects remain relatively unexplored. Given the distinctive nature of ML development, understanding how CI practices are adopted in this context is crucial for tailoring effective approaches. In this study, we conduct a comprehensive analysis of 185 open-source projects on GitHub (93 ML and 92 non-ML projects). Our investigation comprises both quantitative and qualitative dimensions, aiming to uncover differences in CI adoption between ML and non-ML projects. Our findings indicate that ML projects often require longer build durations, and medium-sized ML projects exhibit lower test coverage compared to non-ML projects. Moreover, small and medium-sized ML projects show a higher prevalence of increasing build duration trends compared to their non-ML counterparts. Additionally, our qualitative analysis illuminates the discussions around CI in both ML and non-ML projects, encompassing themes like CI Build Execution and Status, CI Testing, and CI Infrastructure. These insights shed light on the unique challenges faced by ML projects in adopting CI practices effectively.

Related papers

Benchmarking Chinese Commonsense Reasoning with a Multi-hop Reasoning Perspective [53.594353527056775]
We propose Chinese Commonsense Multi-hop Reasoning ( CCMOR) to evaluate Large Language Models (LLMs)<n> CCMOR is designed to evaluate LLMs' ability to integrate Chinese-specific factual knowledge with multi-step logical reasoning.<n>We implement a human-in-the-loop verification system, where domain experts systematically validate and refine the generated questions.
arXiv Detail & Related papers (2025-10-09T20:29:00Z)
NaViL: Rethinking Scaling Properties of Native Multimodal Large Language Models under Data Constraints [100.02131897927484]
This paper focuses on the native training of Multimodal Large Language Models (MLLMs) in an end-to-end manner.<n>We propose a native MLLM called NaViL, combined with a simple and cost-effective recipe.<n> Experimental results on 14 multimodal benchmarks confirm the competitive performance of NaViL against existing MLLMs.
arXiv Detail & Related papers (2025-10-09T17:59:37Z)
Empowering Multimodal LLMs with External Tools: A Comprehensive Survey [61.66069828956139]
Multimodal Large Language Models (MLLMs) have achieved great success in various multimodal tasks, pointing toward a promising pathway to artificial general intelligence.<n>Lack of multimodal data, poor performance on many complex downstream tasks, and inadequate evaluation protocols hinder the reliability and broader applicability of MLLMs.<n>Inspired by the human ability to leverage external tools for enhanced reasoning and problem-solving, augmenting MLLMs with external tools offers a promising strategy to overcome these challenges.
arXiv Detail & Related papers (2025-08-14T07:25:45Z)
Evaluating Large Language Models for Real-World Engineering Tasks [75.97299249823972]
This paper introduces a curated database comprising over 100 questions derived from authentic, production-oriented engineering scenarios.<n>Using this dataset, we evaluate four state-of-the-art Large Language Models (LLMs)<n>Our results show that LLMs demonstrate strengths in basic temporal and structural reasoning but struggle significantly with abstract reasoning, formal modeling, and context-sensitive engineering logic.
arXiv Detail & Related papers (2025-05-12T14:05:23Z)
BinMetric: A Comprehensive Binary Analysis Benchmark for Large Language Models [50.17907898478795]
We introduce BinMetric, a benchmark designed to evaluate the performance of large language models on binary analysis tasks.<n>BinMetric comprises 1,000 questions derived from 20 real-world open-source projects across 6 practical binary analysis tasks.<n>Our empirical study on this benchmark investigates the binary analysis capabilities of various state-of-the-art LLMs, revealing their strengths and limitations in this field.
arXiv Detail & Related papers (2025-05-12T08:54:07Z)
MLRC-Bench: Can Language Agents Solve Machine Learning Research Challenges? [64.62421656031128]
MLRC-Bench is a benchmark designed to quantify how effectively language agents can tackle challenging Machine Learning (ML) Research Competitions. Unlike prior work, MLRC-Bench measures the key steps of proposing and implementing novel research methods. Even the best-performing tested agent closes only 9.3% of the gap between baseline and top human participant scores.
arXiv Detail & Related papers (2025-04-13T19:35:43Z)
Exploring and Evaluating Multimodal Knowledge Reasoning Consistency of Multimodal Large Language Models [52.569132872560814]
multimodal large language models (MLLMs) have achieved significant breakthroughs, enhancing understanding across text and vision. However, current MLLMs still face challenges in effectively integrating knowledge across these modalities during multimodal knowledge reasoning. We analyze and compare the extent of consistency degradation in multimodal knowledge reasoning within MLLMs.
arXiv Detail & Related papers (2025-03-03T09:01:51Z)
Continuous Integration Practices in Machine Learning Projects: The Practitioners` Perspective [1.4165457606269516]
This study surveys 155 practitioners from 47 Machine Learning (ML) projects. Practitioners highlighted eight key differences, including test complexity, infrastructure requirements, and build duration and stability. Common challenges mentioned by practitioners include higher project complexity, model training demands, extensive data handling, increased computational resource needs, and dependency management.
arXiv Detail & Related papers (2025-02-24T18:01:50Z)
Benchmarking Large and Small MLLMs [71.78055760441256]
Large multimodal language models (MLLMs) have achieved remarkable advancements in understanding and generating multimodal content. However, their deployment faces significant challenges, including slow inference, high computational cost, and impracticality for on-device applications. Small MLLMs, exemplified by the LLava-series models and Phi-3-Vision, offer promising alternatives with faster inference, reduced deployment costs, and the ability to handle domain-specific scenarios.
arXiv Detail & Related papers (2025-01-04T07:44:49Z)
A Comprehensive Review of Multimodal Large Language Models: Performance and Challenges Across Different Tasks [74.52259252807191]
Multimodal Large Language Models (MLLMs) address the complexities of real-world applications far beyond the capabilities of single-modality systems. This paper systematically sorts out the applications of MLLM in multimodal tasks such as natural language, vision, and audio.
arXiv Detail & Related papers (2024-08-02T15:14:53Z)
Large Language Models as Reliable Knowledge Bases? [60.25969380388974]
Large Language Models (LLMs) can be viewed as potential knowledge bases (KBs) This study defines criteria that a reliable LLM-as-KB should meet, focusing on factuality and consistency. strategies like ICL and fine-tuning are unsuccessful at making LLMs better KBs.
arXiv Detail & Related papers (2024-07-18T15:20:18Z)
The Synergy between Data and Multi-Modal Large Language Models: A Survey from Co-Development Perspective [53.48484062444108]
We find that the development of models and data is not two separate paths but rather interconnected. On the one hand, vaster and higher-quality data contribute to better performance of MLLMs; on the other hand, MLLMs can facilitate the development of data. To promote the data-model co-development for MLLM community, we systematically review existing works related to MLLMs from the data-model co-development perspective.
arXiv Detail & Related papers (2024-07-11T15:08:11Z)
T-Eval: Evaluating the Tool Utilization Capability of Large Language Models Step by Step [69.64348626180623]
Large language models (LLM) have achieved remarkable performance on various NLP tasks. How to evaluate and analyze the tool-utilization capability of LLMs is still under-explored. We introduce T-Eval to evaluate the tool utilization capability step by step.
arXiv Detail & Related papers (2023-12-21T17:02:06Z)
An Empirical Study of Self-Admitted Technical Debt in Machine Learning Software [17.999512016809945]
Self-admitted technical debt (SATD) can have a significant impact on the quality of machine learning-based software. This paper aims to investigate SATD in ML code by analyzing 318 open-source ML projects across five domains, along with 318 non-ML projects.
arXiv Detail & Related papers (2023-11-20T18:56:36Z)
Fairness of ChatGPT and the Role Of Explainable-Guided Prompts [6.079011829257036]
Our research investigates the potential of Large-scale Language Models (LLMs), specifically OpenAI's GPT, in credit risk assessment. Our findings suggest that LLMs, when directed by judiciously designed prompts and supplemented with domain-specific knowledge, can parallel the performance of traditional Machine Learning (ML) models.
arXiv Detail & Related papers (2023-07-14T09:20:16Z)
A Survey on Multimodal Large Language Models [71.63375558033364]
Multimodal Large Language Model (MLLM) represented by GPT-4V has been a new rising research hotspot. This paper aims to trace and summarize the recent progress of MLLMs.
arXiv Detail & Related papers (2023-06-23T15:21:52Z)
MLCopilot: Unleashing the Power of Large Language Models in Solving Machine Learning Tasks [31.733088105662876]
We aim to bridge the gap between machine intelligence and human knowledge by introducing a novel framework. We showcase the possibility of extending the capability of LLMs to comprehend structured inputs and perform thorough reasoning for solving novel ML tasks.
arXiv Detail & Related papers (2023-04-28T17:03:57Z)
Reasonable Scale Machine Learning with Open-Source Metaflow [2.637746074346334]
We argue that re-purposing existing tools won't solve the current productivity issues. We introduce Metaflow, an open-source framework for ML projects explicitly designed to boost the productivity of data practitioners.
arXiv Detail & Related papers (2023-03-21T11:28:09Z)
"Project smells" -- Experiences in Analysing the Software Quality of ML Projects with mllint [6.0141405230309335]
We introduce the novel concept of project smells which consider deficits in project management as a more holistic perspective on software quality. An open-source static analysis tool mllint was also implemented to help detect and mitigate these. Our findings indicate a need for context-aware static analysis tools, that fit the needs of the project at its current stage of development.
arXiv Detail & Related papers (2022-01-20T15:52:24Z)
Understanding the Usability Challenges of Machine Learning In High-Stakes Decision Making [67.72855777115772]
Machine learning (ML) is being applied to a diverse and ever-growing set of domains. In many cases, domain experts -- who often have no expertise in ML or data science -- are asked to use ML predictions to make high-stakes decisions. We investigate the ML usability challenges present in the domain of child welfare screening through a series of collaborations with child welfare screeners.
arXiv Detail & Related papers (2021-03-02T22:50:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.