Related papers: Evaluating Large Language Models for Causal Modeling

Evaluating Large Language Models for Causal Modeling

URL: http://arxiv.org/abs/2411.15888v1
Date: Sun, 24 Nov 2024 15:51:56 GMT
Title: Evaluating Large Language Models for Causal Modeling
Authors: Houssam Razouk, Leonie Benischke, Georg Niess, Roman Kern,
Abstract summary: We consider the process of transforming causal domain knowledge into a representation that aligns more closely with guidelines from causal data science. We introduce two novel tasks related to distilling causal domain knowledge into causal variables and detecting interaction entities using LLMs.
Score: 1.5468177185307304
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this paper, we consider the process of transforming causal domain knowledge into a representation that aligns more closely with guidelines from causal data science. To this end, we introduce two novel tasks related to distilling causal domain knowledge into causal variables and detecting interaction entities using LLMs. We have determined that contemporary LLMs are helpful tools for conducting causal modeling tasks in collaboration with human experts, as they can provide a wider perspective. Specifically, LLMs, such as GPT-4-turbo and Llama3-70b, perform better in distilling causal domain knowledge into causal variables compared to sparse expert models, such as Mixtral-8x22b. On the contrary, sparse expert models such as Mixtral-8x22b stand out as the most effective in identifying interaction entities. Finally, we highlight the dependency between the domain where the entities are generated and the performance of the chosen LLM for causal modeling.

Related papers

Correlated Errors in Large Language Models [0.6856888934092934]
We find substantial correlation in model errors on a leaderboard dataset.<n>We identify factors driving model correlation, including shared architectures and providers.<n>We show the effects of correlation in two downstream tasks: LLM-as-judge evaluation and hiring.
arXiv Detail & Related papers (2025-06-09T17:37:18Z)
Enhancing Multi-Hop Fact Verification with Structured Knowledge-Augmented Large Language Models [26.023148371263012]
We propose a novel Structured Knowledge-Augmented LLM-based Network (LLM-SKAN) for multi-hop fact verification. Specifically, we utilize an LLM-driven Knowledge Extractor to capture fine-grained information, including entities and their complicated relations. The experimental results on four common-used datasets demonstrate the effectiveness and superiority of our model.
arXiv Detail & Related papers (2025-03-11T14:47:24Z)
Preference Leakage: A Contamination Problem in LLM-as-a-judge [69.96778498636071]
Large Language Models (LLMs) as judges and LLM-based data synthesis have emerged as two fundamental LLM-driven data annotation methods. In this work, we expose preference leakage, a contamination problem in LLM-as-a-judge caused by the relatedness between the synthetic data generators and LLM-based evaluators.
arXiv Detail & Related papers (2025-02-03T17:13:03Z)
Boosting LLM-based Relevance Modeling with Distribution-Aware Robust Learning [14.224921308101624]
We propose a novel Distribution-Aware Robust Learning framework (DaRL) for relevance modeling. DaRL has been deployed online to serve the Alipay's insurance product search.
arXiv Detail & Related papers (2024-12-17T03:10:47Z)
Combining Domain and Alignment Vectors to Achieve Better Knowledge-Safety Trade-offs in LLMs [64.83462841029089]
We introduce an efficient merging-based alignment method called textscMergeAlign that interpolates the domain and alignment vectors, creating safer domain-specific models. We apply textscMergeAlign on Llama3 variants that are experts in medicine and finance, obtaining substantial alignment improvements with minimal to no degradation on domain-specific benchmarks.
arXiv Detail & Related papers (2024-11-11T09:32:20Z)
Investigating LLM Applications in E-Commerce [17.854070801235217]
Large Language Models (LLMs) have revolutionized natural language processing in various applications especially in e-commerce. This paper explored the efficacy of LLMs in the e-commerce domain, focusing on instruction-tuning an open source LLM model with public e-commerce datasets of varying sizes. We examined the effectiveness of the current niche industrial application of very large LLM, using in-context learning, in e-commerce specific tasks.
arXiv Detail & Related papers (2024-08-23T00:57:37Z)
FactorLLM: Factorizing Knowledge via Mixture of Experts for Large Language Models [50.331708897857574]
We introduce FactorLLM, a novel approach that decomposes well-trained dense FFNs into sparse sub-networks without requiring any further modifications. FactorLLM achieves comparable performance to the source model securing up to 85% model performance while obtaining over a 30% increase in inference speed.
arXiv Detail & Related papers (2024-08-15T16:45:16Z)
CogBench: a large language model walks into a psychology lab [12.981407327149679]
This paper introduces CogBench, a benchmark that includes ten behavioral metrics derived from seven cognitive psychology experiments. We apply CogBench to 35 large language models (LLMs) and analyze this data using statistical multilevel modeling techniques. We find that open-source models are less risk-prone than proprietary models and that fine-tuning on code does not necessarily enhance LLMs' behavior.
arXiv Detail & Related papers (2024-02-28T10:43:54Z)
Characterizing Truthfulness in Large Language Model Generations with Local Intrinsic Dimension [63.330262740414646]
We study how to characterize and predict the truthfulness of texts generated from large language models (LLMs) We suggest investigating internal activations and quantifying LLM's truthfulness using the local intrinsic dimension (LID) of model activations.
arXiv Detail & Related papers (2024-02-28T04:56:21Z)
Knowledge Plugins: Enhancing Large Language Models for Domain-Specific Recommendations [50.81844184210381]
We propose a general paradigm that augments large language models with DOmain-specific KnowledgE to enhance their performance on practical applications, namely DOKE. This paradigm relies on a domain knowledge extractor, working in three steps: 1) preparing effective knowledge for the task; 2) selecting the knowledge for each specific sample; and 3) expressing the knowledge in an LLM-understandable way.
arXiv Detail & Related papers (2023-11-16T07:09:38Z)
Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from a Parametric Perspective [106.92016199403042]
We empirically investigate knowledge transfer from larger to smaller models through a parametric perspective. We employ sensitivity-based techniques to extract and align knowledge-specific parameters between different large language models. Our findings highlight the critical factors contributing to the process of parametric knowledge transfer.
arXiv Detail & Related papers (2023-10-17T17:58:34Z)
Adapting Large Language Models for Content Moderation: Pitfalls in Data Engineering and Supervised Fine-tuning [79.53130089003986]
Large Language Models (LLMs) have become a feasible solution for handling tasks in various domains. In this paper, we introduce how to fine-tune a LLM model that can be privately deployed for content moderation.
arXiv Detail & Related papers (2023-10-05T09:09:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.