Related papers: Reusability Challenges of Scientific Workflows: A Case Study for Galaxy

Reusability Challenges of Scientific Workflows: A Case Study for Galaxy

URL: http://arxiv.org/abs/2309.07291v1
Date: Wed, 13 Sep 2023 20:17:43 GMT
Title: Reusability Challenges of Scientific Workflows: A Case Study for Galaxy
Authors: Khairul Alam, Banani Roy, Alexander Serebrenik
Abstract summary: This study examined the reusability of existing and exposed several challenges. The challenges preventing reusability include tool upgrading, tool support, design flaws, incomplete, failure to load a workflow, etc.
Score: 56.78572674167333
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Scientific workflow has become essential in software engineering because it provides a structured approach to designing, executing, and analyzing scientific experiments. Software developers and researchers have developed hundreds of scientific workflow management systems so scientists in various domains can benefit from them by automating repetitive tasks, enhancing collaboration, and ensuring the reproducibility of their results. However, even for expert users, workflow creation is a complex task due to the dramatic growth of tools and data heterogeneity. Thus, scientists attempt to reuse existing workflows shared in workflow repositories. Unfortunately, several challenges prevent scientists from reusing those workflows. In this study, we thus first attempted to identify those reusability challenges. We also offered an action list and evidence-based guidelines to promote the reusability of scientific workflows. Our intensive manual investigation examined the reusability of existing workflows and exposed several challenges. The challenges preventing reusability include tool upgrading, tool support unavailability, design flaws, incomplete workflows, failure to load a workflow, etc. Such challenges and our action list offered guidelines to future workflow composers to create better workflows with enhanced reusability. In the future, we plan to develop a recommender system using reusable workflows that can assist scientists in creating effective and error-free workflows.

Related papers

Unsupervised Machine Learning for Scientific Discovery: Workflow and Best Practices [4.6498278084317715]
Unsupervised machine learning is widely used to make data-driven discoveries in critical domains such as climate science, biomedicine, astronomy, chemistry, and more.<n>Despite its widespread utilization, there is a lack of standardization in unsupervised learning for making reliable and reproducible scientific discoveries.<n>We highlight and discuss best practices starting with formulating validatable scientific questions, conducting robust data preparation and exploration, using a range of modeling techniques, performing rigorous validation by evaluating the stability and generalizability of unsupervised learning conclusions, and promoting effective communication and documentation of results to ensure reproducible scientific discoveries.
arXiv Detail & Related papers (2025-06-05T01:58:45Z)
ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows [82.07367406991678]
Large Language Models (LLMs) have extended their impact beyond Natural Language Processing.<n>Among these, computer-using agents are capable of interacting with operating systems as humans do.<n>We introduce ScienceBoard, which encompasses a realistic, multi-domain environment featuring dynamic and visually rich scientific software.
arXiv Detail & Related papers (2025-05-26T12:27:27Z)
WorkTeam: Constructing Workflows from Natural Language with Multi-Agents [6.656951366751657]
Hand-crafted workflow construction requires expert knowledge, presenting significant technical barriers. We propose WorkTeam, a multi-agent NL2Workflow framework comprising a supervisor, orchestrator, and filler agent. Our approach significantly increases the success rate of workflow construction, providing a novel and effective solution for enterprise NL2Workflow services.
arXiv Detail & Related papers (2025-03-28T14:33:29Z)
Benchmarking Agentic Workflow Generation [80.74757493266057]
We introduce WorFBench, a unified workflow generation benchmark with multi-faceted scenarios and intricate graph workflow structures. We also present WorFEval, a systemic evaluation protocol utilizing subsequence and subgraph matching algorithms. We observe that the generated can enhance downstream tasks, enabling them to achieve superior performance with less time during inference.
arXiv Detail & Related papers (2024-10-10T12:41:19Z)
WorkflowHub: a registry for computational workflows [0.34864924310198164]
As both combined records of analysis and descriptions of processing steps should be reusable, reusable, and available. Workflow sharing presents opportunities to reduce unnecessary reinvention, promote reuse, increase access to best practice analyses for non-experts, and increase productivity. Hub provides a unified registry for all computational registries that links to community repositories. The registry has a global reach, with hundreds of research organisations involved, and more than 700 registered.
arXiv Detail & Related papers (2024-10-09T14:36:27Z)
The Hidden Costs of Automation: An Empirical Study on GitHub Actions Workflow Maintenance [45.53834452021771]
GitHub Actions (GA) is an orchestration platform that streamlines the automatic execution of engineering tasks. Human intervention is necessary to correct defects, update dependencies, or existing workflow files.
arXiv Detail & Related papers (2024-09-04T01:33:16Z)
Employing Artificial Intelligence to Steer Exascale Workflows with Colmena [37.42013214123005]
Colmena allows scientists to define how their application should respond to events as a series of cooperative agents. We describe the challenges we overcame while deploying applications on exascale systems, and the science we have enhanced through AI. Our vision is that Colmena will spur creative solutions that harness AI across many domains of scientific computing.
arXiv Detail & Related papers (2024-08-26T17:21:19Z)
MASSW: A New Dataset and Benchmark Tasks for AI-Assisted Scientific Workflows [58.56005277371235]
We introduce MASSW, a comprehensive text dataset on Multi-Aspect Summarization of ScientificAspects. MASSW includes more than 152,000 peer-reviewed publications from 17 leading computer science conferences spanning the past 50 years. We demonstrate the utility of MASSW through multiple novel machine-learning tasks that can be benchmarked using this new dataset.
arXiv Detail & Related papers (2024-06-10T15:19:09Z)
An Empirical Study of Developers' Challenges in Implementing Workflows as Code: A Case Study on Apache Airflow [9.189463227291377]
We study Stack Overflow posts derived from 9,591 Airflow-related questions to understand developers' challenges and root causes. We find that the most significant obstacles arise when defining and executing their workflow. Our analysis identifies 10 root causes behind the challenges, including incorrect configuration, complex environmental setup, and a lack of basic knowledge about Airflow and the external systems that it interacts with.
arXiv Detail & Related papers (2024-05-31T20:16:03Z)
DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows [72.40917624485822]
We introduce DataDreamer, an open source Python library that allows researchers to implement powerful large language models. DataDreamer also helps researchers adhere to best practices that we propose to encourage open science.
arXiv Detail & Related papers (2024-02-16T00:10:26Z)
Large Language Models to the Rescue: Reducing the Complexity in Scientific Workflow Development Using ChatGPT [11.410608233274942]
Scientific systems are increasingly popular for expressing and executing complex data analysis pipelines over large datasets. However, implementing is difficult due to the involvement of many blackbox tools and the deep infrastructure stack necessary for their execution. We investigate the efficiency of Large Language Models, specifically ChatGPT, to support users when dealing with scientific domains.
arXiv Detail & Related papers (2023-11-03T10:28:53Z)
Multi-Fidelity Active Learning with GFlowNets [65.91555804996203]
We propose a multi-fidelity active learning algorithm with GFlowNets as a sampler, to efficiently discover diverse, high-scoring candidates. Our evaluation on molecular discovery tasks shows that multi-fidelity active learning with GFlowNets can discover high-scoring candidates at a fraction of the budget of its single-fidelity counterpart.
arXiv Detail & Related papers (2023-06-20T17:43:42Z)
GFlowNets for AI-Driven Scientific Discovery [74.27219800878304]
We present a new probabilistic machine learning framework called GFlowNets. GFlowNets can be applied in the modeling, hypotheses generation and experimental design stages of the experimental science loop. We argue that GFlowNets can become a valuable tool for AI-driven scientific discovery.
arXiv Detail & Related papers (2023-02-01T17:29:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.