Towards a Benchmark for Causal Business Process Reasoning with LLMs
- URL: http://arxiv.org/abs/2406.05506v2
- Date: Tue, 16 Jul 2024 15:48:32 GMT
- Title: Towards a Benchmark for Causal Business Process Reasoning with LLMs
- Authors: Fabiana Fournier, Lior Limonad, Inna Skarbovsky,
- Abstract summary: Large Language Models (LLMs) are increasingly used for boosting organizational efficiency and automating tasks.
Recent efforts have further extended to employ LLMs in activities such as reasoning, planning, and decision-making.
In this work, we plant the seeds for the development of a benchmark to assess the ability of LLMs to reason about causal and process perspectives of business operations.
- Score: 2.273531916003657
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Language Models (LLMs) are increasingly used for boosting organizational efficiency and automating tasks. While not originally designed for complex cognitive processes, recent efforts have further extended to employ LLMs in activities such as reasoning, planning, and decision-making. In business processes, such abilities could be invaluable for leveraging on the massive corpora LLMs have been trained on for gaining deep understanding of such processes. In this work, we plant the seeds for the development of a benchmark to assess the ability of LLMs to reason about causal and process perspectives of business operations. We refer to this view as Causally-augmented Business Processes (BP^C). The core of the benchmark comprises a set of BP^C related situations, a set of questions about these situations, and a set of deductive rules employed to systematically resolve the ground truth answers to these questions. Also with the power of LLMs, the seed is then instantiated into a larger-scale set of domain-specific situations and questions. Reasoning on BP^C is of crucial importance for process interventions and process improvement. Our benchmark, accessible at https://huggingface.co/datasets/ibm/BPC, can be used in one of two possible modalities: testing the performance of any target LLM and training an LLM to advance its capability to reason about BP^C.
Related papers
- Language Agents Meet Causality -- Bridging LLMs and Causal World Models [50.79984529172807]
We propose a framework that integrates causal representation learning with large language models.
This framework learns a causal world model, with causal variables linked to natural language expressions.
We evaluate the framework on causal inference and planning tasks across temporal scales and environmental complexities.
arXiv Detail & Related papers (2024-10-25T18:36:37Z) - Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making [85.24399869971236]
We aim to evaluate Large Language Models (LLMs) for embodied decision making.
Existing evaluations tend to rely solely on a final success rate.
We propose a generalized interface (Embodied Agent Interface) that supports the formalization of various types of tasks.
arXiv Detail & Related papers (2024-10-09T17:59:00Z) - Interpreting and Improving Large Language Models in Arithmetic Calculation [72.19753146621429]
Large language models (LLMs) have demonstrated remarkable potential across numerous applications.
In this work, we delve into uncovering a specific mechanism by which LLMs execute calculations.
We investigate the potential benefits of selectively fine-tuning these essential heads/MLPs to boost the LLMs' computational performance.
arXiv Detail & Related papers (2024-09-03T07:01:46Z) - PM-LLM-Benchmark: Evaluating Large Language Models on Process Mining Tasks [45.129578769739]
Large Language Models (LLMs) have the potential to semi-automate some process mining (PM) analyses.
We propose PM-LLM-Benchmark, the first comprehensive benchmark for PM focusing on domain knowledge.
We observe that most of the considered LLMs can perform some process mining tasks at a satisfactory level, but tiny models that would run on edge devices are still inadequate.
arXiv Detail & Related papers (2024-07-18T07:57:31Z) - Evaluating the Ability of LLMs to Solve Semantics-Aware Process Mining Tasks [3.9273545629281252]
Large language models (LLMs) could be used to tackle process mining tasks that benefit from an understanding of process behavior.
In this paper, we investigate the capabilities of LLMs to tackle such semantics-aware process mining tasks.
arXiv Detail & Related papers (2024-07-02T14:44:49Z) - Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning [53.6472920229013]
Large Language Models (LLMs) have demonstrated impressive capability in many natural language tasks.
LLMs are prone to produce errors, hallucinations and inconsistent statements when performing multi-step reasoning.
We introduce Q*, a framework for guiding LLMs decoding process with deliberative planning.
arXiv Detail & Related papers (2024-06-20T13:08:09Z) - Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More? [54.667202878390526]
Long-context language models (LCLMs) have the potential to revolutionize our approach to tasks traditionally reliant on external tools like retrieval systems or databases.
We introduce LOFT, a benchmark of real-world tasks requiring context up to millions of tokens designed to evaluate LCLMs' performance on in-context retrieval and reasoning.
Our findings reveal LCLMs' surprising ability to rival state-of-the-art retrieval and RAG systems, despite never having been explicitly trained for these tasks.
arXiv Detail & Related papers (2024-06-19T00:28:58Z) - A Survey of Useful LLM Evaluation [20.048914787813263]
Two-stage framework: from core ability'' to agent''
In the "core ability" stage, we discussed the reasoning ability, societal impact, and domain knowledge of LLMs.
In the agent'' stage, we demonstrated embodied action, planning, and tool learning of LLMs agent applications.
arXiv Detail & Related papers (2024-06-03T02:20:03Z) - When does In-context Learning Fall Short and Why? A Study on
Specification-Heavy Tasks [54.71034943526973]
In-context learning (ICL) has become the default method for using large language models (LLMs)
We find that ICL falls short of handling specification-heavy tasks, which are tasks with complicated and extensive task specifications.
We identify three primary reasons: inability to specifically understand context, misalignment in task schema comprehension with humans, and inadequate long-text understanding ability.
arXiv Detail & Related papers (2023-11-15T14:26:30Z) - Causal Reasoning and Large Language Models: Opening a New Frontier for Causality [29.433401785920065]
Large language models (LLMs) can generate causal arguments with high probability.
LLMs may be used by human domain experts to save effort in setting up a causal analysis.
arXiv Detail & Related papers (2023-04-28T19:00:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.