Large Language Models to the Rescue: Reducing the Complexity in
  Scientific Workflow Development Using ChatGPT
        - URL: http://arxiv.org/abs/2311.01825v2
- Date: Mon, 6 Nov 2023 11:43:33 GMT
- Title: Large Language Models to the Rescue: Reducing the Complexity in
  Scientific Workflow Development Using ChatGPT
- Authors: Mario S\"anger, Ninon De Mecquenem, Katarzyna Ewa Lewi\'nska, Vasilis
  Bountris, Fabian Lehmann, Ulf Leser, Thomas Kosch
- Abstract summary: Scientific systems are increasingly popular for expressing and executing complex data analysis pipelines over large datasets.
However, implementing is difficult due to the involvement of many blackbox tools and the deep infrastructure stack necessary for their execution.
We investigate the efficiency of Large Language Models, specifically ChatGPT, to support users when dealing with scientific domains.
- Score: 11.410608233274942
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract:   Scientific workflow systems are increasingly popular for expressing and
executing complex data analysis pipelines over large datasets, as they offer
reproducibility, dependability, and scalability of analyses by automatic
parallelization on large compute clusters. However, implementing workflows is
difficult due to the involvement of many black-box tools and the deep
infrastructure stack necessary for their execution. Simultaneously,
user-supporting tools are rare, and the number of available examples is much
lower than in classical programming languages. To address these challenges, we
investigate the efficiency of Large Language Models (LLMs), specifically
ChatGPT, to support users when dealing with scientific workflows. We performed
three user studies in two scientific domains to evaluate ChatGPT for
comprehending, adapting, and extending workflows. Our results indicate that
LLMs efficiently interpret workflows but achieve lower performance for
exchanging components or purposeful workflow extensions. We characterize their
limitations in these challenging scenarios and suggest future research
directions.
 
      
        Related papers
        - From Prompt to Pipeline: Large Language Models for Scientific Workflow   Development in Bioinformatics [2.2160604288512324]
 This study investigates whether modern Large Language Models (LLMs) can support the generation of accurate, complete, and usable bioinformatics tasks.<n>We evaluate these models using diverse SNP analysis, RNA-seq, DNA methylation, and data retrieval platforms.<n>The results show that Gemini 2.5 Flash excels in generating Galaxy, while DeepSeek-V3 performs strongly in Nextflow.
 arXiv  Detail & Related papers  (2025-07-27T04:08:11Z)
- EIFBENCH: Extremely Complex Instruction Following Benchmark for Large   Language Models [65.48902212293903]
 We present the Extremely Complex Instruction Following Benchmark (EIFBENCH) for evaluating large language models (LLMs)<n>EIFBENCH includes multi-task scenarios that enable comprehensive assessment across diverse task types concurrently.<n>We also propose the Segment Policy Optimization (SegPO) algorithm to enhance the LLM's ability to accurately fulfill multi-task workflow.
 arXiv  Detail & Related papers  (2025-06-10T02:39:55Z)
- FamilyTool: A Multi-hop Personalized Tool Use Benchmark [94.1158032740113]
 We introduce FamilyTool, a novel benchmark grounded in a family-based knowledge graph (KG)
FamilyTool challenges Large Language Models with queries spanning 1 to 3 relational hops.
Experiments reveal significant performance gaps in state-of-the-art LLMs.
 arXiv  Detail & Related papers  (2025-04-09T10:42:36Z)
- WorkTeam: Constructing Workflows from Natural Language with Multi-Agents [6.656951366751657]
 Hand-crafted workflow construction requires expert knowledge, presenting significant technical barriers.
We propose WorkTeam, a multi-agent NL2Workflow framework comprising a supervisor, orchestrator, and filler agent.
Our approach significantly increases the success rate of workflow construction, providing a novel and effective solution for enterprise NL2Workflow services.
 arXiv  Detail & Related papers  (2025-03-28T14:33:29Z)
- GNNs as Predictors of Agentic Workflow Performances [48.34485750450876]
 Agentic invoked by Large Language Models (LLMs) have achieved remarkable success in handling complex tasks.
This paper formulates agentic as computational graphs and advocates Graph Neural Networks (GNNs) as efficient predictors of agentic performances.
We construct FLORA-Bench, a unified platform for benchmarking GNNs for predicting agentic workflow performances.
 arXiv  Detail & Related papers  (2025-03-14T11:11:00Z)
- Improving Small-Scale Large Language Models Function Calling for   Reasoning Tasks [0.8425561594225592]
 This study introduces a novel framework for training smaller language models in function calling.
It focuses on specific logical and mathematical reasoning tasks.
The approach aims to improve performances of small-scale models for these tasks using function calling.
 arXiv  Detail & Related papers  (2024-10-24T16:27:35Z)
- Benchmarking Agentic Workflow Generation [80.74757493266057]
 We introduce WorFBench, a unified workflow generation benchmark with multi-faceted scenarios and intricate graph workflow structures.
We also present WorFEval, a systemic evaluation protocol utilizing subsequence and subgraph matching algorithms.
We observe that the generated can enhance downstream tasks, enabling them to achieve superior performance with less time during inference.
 arXiv  Detail & Related papers  (2024-10-10T12:41:19Z)
- Mixing It Up: The Cocktail Effect of Multi-Task Fine-Tuning on LLM   Performance -- A Case Study in Finance [0.32985979395737774]
 We study the application of large language models (LLMs) in domain-specific contexts, including finance.
We find that fine-tuning exclusively on the target task is not always the most effective strategy.
Instead, multi-task fine-tuning can significantly enhance performance.
 arXiv  Detail & Related papers  (2024-10-01T22:35:56Z)
- FactorLLM: Factorizing Knowledge via Mixture of Experts for Large   Language Models [50.331708897857574]
 We introduce FactorLLM, a novel approach that decomposes well-trained dense FFNs into sparse sub-networks without requiring any further modifications.
FactorLLM achieves comparable performance to the source model securing up to 85% model performance while obtaining over a 30% increase in inference speed.
 arXiv  Detail & Related papers  (2024-08-15T16:45:16Z)
- Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More? [54.667202878390526]
 Long-context language models (LCLMs) have the potential to revolutionize our approach to tasks traditionally reliant on external tools like retrieval systems or databases.
We introduce LOFT, a benchmark of real-world tasks requiring context up to millions of tokens designed to evaluate LCLMs' performance on in-context retrieval and reasoning.
Our findings reveal LCLMs' surprising ability to rival state-of-the-art retrieval and RAG systems, despite never having been explicitly trained for these tasks.
 arXiv  Detail & Related papers  (2024-06-19T00:28:58Z)
- Towards Completeness-Oriented Tool Retrieval for Large Language Models [60.733557487886635]
 Real-world systems often incorporate a wide array of tools, making it impractical to input all tools into Large Language Models.
Existing tool retrieval methods primarily focus on semantic matching between user queries and tool descriptions.
We propose a novel modelagnostic COllaborative Learning-based Tool Retrieval approach, COLT, which captures not only the semantic similarities between user queries and tool descriptions but also takes into account the collaborative information of tools.
 arXiv  Detail & Related papers  (2024-05-25T06:41:23Z)
- Characterization of Large Language Model Development in the Datacenter [55.9909258342639]
 Large Language Models (LLMs) have presented impressive performance across several transformative tasks.
However, it is non-trivial to efficiently utilize large-scale cluster resources to develop LLMs.
We present an in-depth characterization study of a six-month LLM development workload trace collected from our GPU datacenter Acme.
 arXiv  Detail & Related papers  (2024-03-12T13:31:14Z)
- Reusability Challenges of Scientific Workflows: A Case Study for Galaxy [56.78572674167333]
 This study examined the reusability of existing and exposed several challenges.
The challenges preventing reusability include tool upgrading, tool support, design flaws, incomplete, failure to load a workflow, etc.
 arXiv  Detail & Related papers  (2023-09-13T20:17:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.