Related papers: Using GPT-4 to guide causal machine learning

Using GPT-4 to guide causal machine learning

URL: http://arxiv.org/abs/2407.18607v1
Date: Fri, 26 Jul 2024 08:59:26 GMT
Title: Using GPT-4 to guide causal machine learning
Authors: Anthony C. Constantinou, Neville K. Kitson, Alessio Zanga,
Abstract summary: We focus on the well-established GPT-4 (Turbo) and evaluate its performance under the most restrictive conditions. We show that questionnaire participants judge the GPT-4 graphs as the most accurate in the evaluated categories. We show that pairing GPT-4 with causal ML overcomes this limitation, resulting in graphical structures learnt from real data that align more closely with those identified by domain experts.
Score: 5.953513005270839
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Since its introduction to the public, ChatGPT has had an unprecedented impact. While some experts praised AI advancements and highlighted their potential risks, others have been critical about the accuracy and usefulness of Large Language Models (LLMs). In this paper, we are interested in the ability of LLMs to identify causal relationships. We focus on the well-established GPT-4 (Turbo) and evaluate its performance under the most restrictive conditions, by isolating its ability to infer causal relationships based solely on the variable labels without being given any context, demonstrating the minimum level of effectiveness one can expect when it is provided with label-only information. We show that questionnaire participants judge the GPT-4 graphs as the most accurate in the evaluated categories, closely followed by knowledge graphs constructed by domain experts, with causal Machine Learning (ML) far behind. We use these results to highlight the important limitation of causal ML, which often produces causal graphs that violate common sense, affecting trust in them. However, we show that pairing GPT-4 with causal ML overcomes this limitation, resulting in graphical structures learnt from real data that align more closely with those identified by domain experts, compared to structures learnt by causal ML alone. Overall, our findings suggest that despite GPT-4 not being explicitly designed to reason causally, it can still be a valuable tool for causal representation, as it improves the causal discovery process of causal ML algorithms that are designed to do just that.

Related papers

Verifying the Verifiers: Unveiling Pitfalls and Potentials in Fact Verifiers [59.168391398830515]
We evaluate 12 pre-trained LLMs and one specialized fact-verifier, using a collection of examples from 14 fact-checking benchmarks.<n>We highlight the importance of addressing annotation errors and ambiguity in datasets.<n> frontier LLMs with few-shot in-context examples, often overlooked in previous works, achieve top-tier performance.
arXiv Detail & Related papers (2025-06-16T10:32:10Z)
Paths to Causality: Finding Informative Subgraphs Within Knowledge Graphs for Knowledge-Based Causal Discovery [10.573861741540853]
We introduce a novel approach that integrates Knowledge Graphs (KGs) with Large Language Models (LLMs) to enhance knowledge-based causal discovery.<n>Our approach identifies informative metapath-based subgraphs within KGs and further refines the selection of these subgraphs using Learning-to-Rank-based models.<n>Our method outperforms most baselines by up to 44.4 points in F1 scores, evaluated across diverse LLMs and KGs.
arXiv Detail & Related papers (2025-06-10T13:13:55Z)
How Personality Traits Shape LLM Risk-Taking Behaviour [0.5937476291232802]
This study investigates the relationship between Large Language Models' personality traits and risk propensity. GPT-4o exhibits higher Conscientiousness and Agreeableness traits compared to human averages. Openness emerges as the most influential factor in GPT-4o's risk propensity, aligning with human findings.
arXiv Detail & Related papers (2025-02-03T14:51:57Z)
CausalGraph2LLM: Evaluating LLMs for Causal Queries [49.337170619608145]
Causality is essential in scientific research, enabling researchers to interpret true relationships between variables. With the recent advancements in Large Language Models (LLMs), there is an increasing interest in exploring their capabilities in causal reasoning.
arXiv Detail & Related papers (2024-10-21T12:12:21Z)
KGPA: Robustness Evaluation for Large Language Models via Cross-Domain Knowledge Graphs [5.798411590796167]
This paper proposes a framework that systematically evaluates the robustness of large language models under adversarial attack scenarios. Our framework generates original prompts from the triplets of knowledge graphs and creates adversarial prompts by poisoning. Experiments show that adversarial robustness of the ChatGPT family ranks as GPT-4-turbo > GPT-4o > GPT-3.5-turbo, and the robustness of large language models is influenced by the professional domains in which they operate.
arXiv Detail & Related papers (2024-06-16T04:48:43Z)
Evaluating Interventional Reasoning Capabilities of Large Language Models [58.52919374786108]
Large language models (LLMs) can estimate causal effects under interventions on different parts of a system. We conduct empirical analyses to evaluate whether LLMs can accurately update their knowledge of a data-generating process in response to an intervention. We create benchmarks that span diverse causal graphs (e.g., confounding, mediation) and variable types, and enable a study of intervention-based reasoning.
arXiv Detail & Related papers (2024-04-08T14:15:56Z)
Hallucinations or Attention Misdirection? The Path to Strategic Value Extraction in Business Using Large Language Models [0.0]
This paper defines attention misdirection rather than true hallucinations. This paper highlights the best practices of the PGI, Persona, Grouping, and Intelligence, method.
arXiv Detail & Related papers (2024-02-21T18:40:24Z)
Zero-shot Causal Graph Extrapolation from Text via LLMs [50.596179963913045]
We evaluate the ability of large language models (LLMs) to infer causal relations from natural language. LLMs show competitive performance in a benchmark of pairwise relations without needing (explicit) training samples. We extend our approach to extrapolating causal graphs through iterated pairwise queries.
arXiv Detail & Related papers (2023-12-22T13:14:38Z)
Causal Inference Using LLM-Guided Discovery [34.040996887499425]
We show that the topological order over graph variables (causal order) alone suffices for causal effect inference. We propose a robust technique of obtaining causal order from Large Language Models (LLMs) Our approach significantly improves causal ordering accuracy as compared to discovery algorithms.
arXiv Detail & Related papers (2023-10-23T17:23:56Z)
Is GPT4 a Good Trader? [12.057320450155835]
Large language models (LLMs) have demonstrated significant capabilities in various planning and reasoning tasks. This study aims to examine the fidelity of GPT-4's comprehension of classic trading theories and its proficiency in applying its code interpreter abilities to real-world trading data analysis.
arXiv Detail & Related papers (2023-09-20T00:47:52Z)
Evaluating Large Language Models on Graphs: Performance Insights and Comparative Analysis [7.099257763803159]
We evaluate the capabilities of four Large Language Models (LLMs) in addressing several analytical problems with graph data. We employ four distinct evaluation metrics: Correctness, Fidelity, and Rectification. GPT models can generate logical and coherent results, outperforming alternatives in correctness.
arXiv Detail & Related papers (2023-08-22T06:32:07Z)
Is GPT-4 a Good Data Analyst? [67.35956981748699]
We consider GPT-4 as a data analyst to perform end-to-end data analysis with databases from a wide range of domains. We design several task-specific evaluation metrics to systematically compare the performance between several professional human data analysts and GPT-4. Experimental results show that GPT-4 can achieve comparable performance to humans.
arXiv Detail & Related papers (2023-05-24T11:26:59Z)
LLMs for Knowledge Graph Construction and Reasoning: Recent Capabilities and Future Opportunities [66.36633042421387]
Large Language Models (LLMs) for Knowledge Graph (KG) construction and reasoning evaluated. We propose AutoKG, a multi-agent-based approach employing LLMs and external sources for KG construction and reasoning.
arXiv Detail & Related papers (2023-05-22T15:56:44Z)
Sparks of Artificial General Intelligence: Early experiments with GPT-4 [66.1188263570629]
GPT-4, developed by OpenAI, was trained using an unprecedented scale of compute and data. We demonstrate that GPT-4 can solve novel and difficult tasks that span mathematics, coding, vision, medicine, law, psychology and more. We believe GPT-4 could reasonably be viewed as an early (yet still incomplete) version of an artificial general intelligence (AGI) system.
arXiv Detail & Related papers (2023-03-22T16:51:28Z)
Prompting GPT-3 To Be Reliable [117.23966502293796]
This work decomposes reliability into four facets: generalizability, fairness, calibration, and factuality. We find that GPT-3 outperforms smaller-scale supervised models by large margins on all these facets.
arXiv Detail & Related papers (2022-10-17T14:52:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.