Related papers: Large Language Models to the Rescue: Reducing the Complexity in Scientific Workflow Development Using ChatGPT

Large Language Models to the Rescue: Reducing the Complexity in Scientific Workflow Development Using ChatGPT

URL: http://arxiv.org/abs/2311.01825v2
Date: Mon, 6 Nov 2023 11:43:33 GMT
Title: Large Language Models to the Rescue: Reducing the Complexity in Scientific Workflow Development Using ChatGPT
Authors: Mario S\"anger, Ninon De Mecquenem, Katarzyna Ewa Lewi\'nska, Vasilis Bountris, Fabian Lehmann, Ulf Leser, Thomas Kosch
Abstract summary: Scientific systems are increasingly popular for expressing and executing complex data analysis pipelines over large datasets. However, implementing is difficult due to the involvement of many blackbox tools and the deep infrastructure stack necessary for their execution. We investigate the efficiency of Large Language Models, specifically ChatGPT, to support users when dealing with scientific domains.
Score: 11.410608233274942
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Scientific workflow systems are increasingly popular for expressing and executing complex data analysis pipelines over large datasets, as they offer reproducibility, dependability, and scalability of analyses by automatic parallelization on large compute clusters. However, implementing workflows is difficult due to the involvement of many black-box tools and the deep infrastructure stack necessary for their execution. Simultaneously, user-supporting tools are rare, and the number of available examples is much lower than in classical programming languages. To address these challenges, we investigate the efficiency of Large Language Models (LLMs), specifically ChatGPT, to support users when dealing with scientific workflows. We performed three user studies in two scientific domains to evaluate ChatGPT for comprehending, adapting, and extending workflows. Our results indicate that LLMs efficiently interpret workflows but achieve lower performance for exchanging components or purposeful workflow extensions. We characterize their limitations in these challenging scenarios and suggest future research directions.

Related papers

FamilyTool: A Multi-hop Personalized Tool Use Benchmark [94.1158032740113]
We introduce FamilyTool, a novel benchmark grounded in a family-based knowledge graph (KG) FamilyTool challenges Large Language Models with queries spanning 1 to 3 relational hops. Experiments reveal significant performance gaps in state-of-the-art LLMs.
arXiv Detail & Related papers (2025-04-09T10:42:36Z)
WorkTeam: Constructing Workflows from Natural Language with Multi-Agents [6.656951366751657]
Hand-crafted workflow construction requires expert knowledge, presenting significant technical barriers. We propose WorkTeam, a multi-agent NL2Workflow framework comprising a supervisor, orchestrator, and filler agent. Our approach significantly increases the success rate of workflow construction, providing a novel and effective solution for enterprise NL2Workflow services.
arXiv Detail & Related papers (2025-03-28T14:33:29Z)
GNNs as Predictors of Agentic Workflow Performances [48.34485750450876]
Agentic invoked by Large Language Models (LLMs) have achieved remarkable success in handling complex tasks. This paper formulates agentic as computational graphs and advocates Graph Neural Networks (GNNs) as efficient predictors of agentic performances. We construct FLORA-Bench, a unified platform for benchmarking GNNs for predicting agentic workflow performances.
arXiv Detail & Related papers (2025-03-14T11:11:00Z)
Improving Small-Scale Large Language Models Function Calling for Reasoning Tasks [0.8425561594225592]
This study introduces a novel framework for training smaller language models in function calling. It focuses on specific logical and mathematical reasoning tasks. The approach aims to improve performances of small-scale models for these tasks using function calling.
arXiv Detail & Related papers (2024-10-24T16:27:35Z)
Benchmarking Agentic Workflow Generation [80.74757493266057]
We introduce WorFBench, a unified workflow generation benchmark with multi-faceted scenarios and intricate graph workflow structures. We also present WorFEval, a systemic evaluation protocol utilizing subsequence and subgraph matching algorithms. We observe that the generated can enhance downstream tasks, enabling them to achieve superior performance with less time during inference.
arXiv Detail & Related papers (2024-10-10T12:41:19Z)
Mixing It Up: The Cocktail Effect of Multi-Task Fine-Tuning on LLM Performance -- A Case Study in Finance [0.32985979395737774]
We study the application of large language models (LLMs) in domain-specific contexts, including finance. We find that fine-tuning exclusively on the target task is not always the most effective strategy. Instead, multi-task fine-tuning can significantly enhance performance.
arXiv Detail & Related papers (2024-10-01T22:35:56Z)
FactorLLM: Factorizing Knowledge via Mixture of Experts for Large Language Models [50.331708897857574]
We introduce FactorLLM, a novel approach that decomposes well-trained dense FFNs into sparse sub-networks without requiring any further modifications. FactorLLM achieves comparable performance to the source model securing up to 85% model performance while obtaining over a 30% increase in inference speed.
arXiv Detail & Related papers (2024-08-15T16:45:16Z)
Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More? [54.667202878390526]
Long-context language models (LCLMs) have the potential to revolutionize our approach to tasks traditionally reliant on external tools like retrieval systems or databases. We introduce LOFT, a benchmark of real-world tasks requiring context up to millions of tokens designed to evaluate LCLMs' performance on in-context retrieval and reasoning. Our findings reveal LCLMs' surprising ability to rival state-of-the-art retrieval and RAG systems, despite never having been explicitly trained for these tasks.
arXiv Detail & Related papers (2024-06-19T00:28:58Z)
Towards Completeness-Oriented Tool Retrieval for Large Language Models [60.733557487886635]
Real-world systems often incorporate a wide array of tools, making it impractical to input all tools into Large Language Models. Existing tool retrieval methods primarily focus on semantic matching between user queries and tool descriptions. We propose a novel modelagnostic COllaborative Learning-based Tool Retrieval approach, COLT, which captures not only the semantic similarities between user queries and tool descriptions but also takes into account the collaborative information of tools.
arXiv Detail & Related papers (2024-05-25T06:41:23Z)
Characterization of Large Language Model Development in the Datacenter [55.9909258342639]
Large Language Models (LLMs) have presented impressive performance across several transformative tasks. However, it is non-trivial to efficiently utilize large-scale cluster resources to develop LLMs. We present an in-depth characterization study of a six-month LLM development workload trace collected from our GPU datacenter Acme.
arXiv Detail & Related papers (2024-03-12T13:31:14Z)
Reusability Challenges of Scientific Workflows: A Case Study for Galaxy [56.78572674167333]
This study examined the reusability of existing and exposed several challenges. The challenges preventing reusability include tool upgrading, tool support, design flaws, incomplete, failure to load a workflow, etc.
arXiv Detail & Related papers (2023-09-13T20:17:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.