Reusability Challenges of Scientific Workflows: A Case Study for Galaxy
- URL: http://arxiv.org/abs/2309.07291v1
- Date: Wed, 13 Sep 2023 20:17:43 GMT
- Title: Reusability Challenges of Scientific Workflows: A Case Study for Galaxy
- Authors: Khairul Alam, Banani Roy, Alexander Serebrenik
- Abstract summary: This study examined the reusability of existing and exposed several challenges.
The challenges preventing reusability include tool upgrading, tool support, design flaws, incomplete, failure to load a workflow, etc.
- Score: 56.78572674167333
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Scientific workflow has become essential in software engineering because it
provides a structured approach to designing, executing, and analyzing
scientific experiments. Software developers and researchers have developed
hundreds of scientific workflow management systems so scientists in various
domains can benefit from them by automating repetitive tasks, enhancing
collaboration, and ensuring the reproducibility of their results. However, even
for expert users, workflow creation is a complex task due to the dramatic
growth of tools and data heterogeneity. Thus, scientists attempt to reuse
existing workflows shared in workflow repositories. Unfortunately, several
challenges prevent scientists from reusing those workflows. In this study, we
thus first attempted to identify those reusability challenges. We also offered
an action list and evidence-based guidelines to promote the reusability of
scientific workflows. Our intensive manual investigation examined the
reusability of existing workflows and exposed several challenges. The
challenges preventing reusability include tool upgrading, tool support
unavailability, design flaws, incomplete workflows, failure to load a workflow,
etc. Such challenges and our action list offered guidelines to future workflow
composers to create better workflows with enhanced reusability. In the future,
we plan to develop a recommender system using reusable workflows that can
assist scientists in creating effective and error-free workflows.
Related papers
- Benchmarking Agentic Workflow Generation [80.74757493266057]
We introduce WorFBench, a unified workflow generation benchmark with multi-faceted scenarios and intricate graph workflow structures.
We also present WorFEval, a systemic evaluation protocol utilizing subsequence and subgraph matching algorithms.
We observe that the generated can enhance downstream tasks, enabling them to achieve superior performance with less time during inference.
arXiv Detail & Related papers (2024-10-10T12:41:19Z) - WorkflowHub: a registry for computational workflows [0.34864924310198164]
As both combined records of analysis and descriptions of processing steps should be reusable, reusable, and available.
Workflow sharing presents opportunities to reduce unnecessary reinvention, promote reuse, increase access to best practice analyses for non-experts, and increase productivity.
Hub provides a unified registry for all computational registries that links to community repositories.
The registry has a global reach, with hundreds of research organisations involved, and more than 700 registered.
arXiv Detail & Related papers (2024-10-09T14:36:27Z) - The Hidden Costs of Automation: An Empirical Study on GitHub Actions Workflow Maintenance [45.53834452021771]
GitHub Actions (GA) is an orchestration platform that streamlines the automatic execution of engineering tasks.
Human intervention is necessary to correct defects, update dependencies, or existing workflow files.
arXiv Detail & Related papers (2024-09-04T01:33:16Z) - Employing Artificial Intelligence to Steer Exascale Workflows with Colmena [37.42013214123005]
Colmena allows scientists to define how their application should respond to events as a series of cooperative agents.
We describe the challenges we overcame while deploying applications on exascale systems, and the science we have enhanced through AI.
Our vision is that Colmena will spur creative solutions that harness AI across many domains of scientific computing.
arXiv Detail & Related papers (2024-08-26T17:21:19Z) - MASSW: A New Dataset and Benchmark Tasks for AI-Assisted Scientific Workflows [58.56005277371235]
We introduce MASSW, a comprehensive text dataset on Multi-Aspect Summarization of ScientificAspects.
MASSW includes more than 152,000 peer-reviewed publications from 17 leading computer science conferences spanning the past 50 years.
We demonstrate the utility of MASSW through multiple novel machine-learning tasks that can be benchmarked using this new dataset.
arXiv Detail & Related papers (2024-06-10T15:19:09Z) - An Empirical Study of Developers' Challenges in Implementing Workflows as Code: A Case Study on Apache Airflow [9.189463227291377]
We study Stack Overflow posts derived from 9,591 Airflow-related questions to understand developers' challenges and root causes.
We find that the most significant obstacles arise when defining and executing their workflow.
Our analysis identifies 10 root causes behind the challenges, including incorrect configuration, complex environmental setup, and a lack of basic knowledge about Airflow and the external systems that it interacts with.
arXiv Detail & Related papers (2024-05-31T20:16:03Z) - DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows [72.40917624485822]
We introduce DataDreamer, an open source Python library that allows researchers to implement powerful large language models.
DataDreamer also helps researchers adhere to best practices that we propose to encourage open science.
arXiv Detail & Related papers (2024-02-16T00:10:26Z) - Large Language Models to the Rescue: Reducing the Complexity in
Scientific Workflow Development Using ChatGPT [11.410608233274942]
Scientific systems are increasingly popular for expressing and executing complex data analysis pipelines over large datasets.
However, implementing is difficult due to the involvement of many blackbox tools and the deep infrastructure stack necessary for their execution.
We investigate the efficiency of Large Language Models, specifically ChatGPT, to support users when dealing with scientific domains.
arXiv Detail & Related papers (2023-11-03T10:28:53Z) - Multi-Fidelity Active Learning with GFlowNets [65.91555804996203]
We propose a multi-fidelity active learning algorithm with GFlowNets as a sampler, to efficiently discover diverse, high-scoring candidates.
Our evaluation on molecular discovery tasks shows that multi-fidelity active learning with GFlowNets can discover high-scoring candidates at a fraction of the budget of its single-fidelity counterpart.
arXiv Detail & Related papers (2023-06-20T17:43:42Z) - GFlowNets for AI-Driven Scientific Discovery [74.27219800878304]
We present a new probabilistic machine learning framework called GFlowNets.
GFlowNets can be applied in the modeling, hypotheses generation and experimental design stages of the experimental science loop.
We argue that GFlowNets can become a valuable tool for AI-driven scientific discovery.
arXiv Detail & Related papers (2023-02-01T17:29:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.