Related papers: Do large language models and humans have similar behaviors in causal inference with script knowledge?

Do large language models and humans have similar behaviors in causal inference with script knowledge?

URL: http://arxiv.org/abs/2311.07311v1
Date: Mon, 13 Nov 2023 13:05:15 GMT
Title: Do large language models and humans have similar behaviors in causal inference with script knowledge?
Authors: Xudong Hong, Margarita Ryzhova, Daniel Adrian Biondi and Vera Demberg
Abstract summary: We study the processing of an event $B$ in a script-based story. In our manipulation, event $A$ is stated, negated, or omitted in an earlier section of the text.
Score: 13.140513796801915
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Recently, large pre-trained language models (LLMs) have demonstrated superior language understanding abilities, including zero-shot causal reasoning. However, it is unclear to what extent their capabilities are similar to human ones. We here study the processing of an event $B$ in a script-based story, which causally depends on a previous event $A$. In our manipulation, event $A$ is stated, negated, or omitted in an earlier section of the text. We first conducted a self-paced reading experiment, which showed that humans exhibit significantly longer reading times when causal conflicts exist ($\neg A \rightarrow B$) than under logical conditions ($A \rightarrow B$). However, reading times remain similar when cause A is not explicitly mentioned, indicating that humans can easily infer event B from their script knowledge. We then tested a variety of LLMs on the same data to check to what extent the models replicate human behavior. Our experiments show that 1) only recent LLMs, like GPT-3 or Vicuna, correlate with human behavior in the $\neg A \rightarrow B$ condition. 2) Despite this correlation, all models still fail to predict that $nil \rightarrow B$ is less surprising than $\neg A \rightarrow B$, indicating that LLMs still have difficulties integrating script knowledge. Our code and collected data set are available at https://github.com/tony-hong/causal-script.

Related papers

Zero-Shot Attribution for Large Language Models: A Distribution Testing Approach [19.455425068600665]
We investigate the problem of attributing code generated by language models using hypothesis testing to leverage established techniques and guarantees.<n>We introduce $mathsfAnubis$, a zero-shot attribution tool that frames attribution as a distribution testing problem.
arXiv Detail & Related papers (2025-06-25T07:37:16Z)
Language Models May Verbatim Complete Text They Were Not Explicitly Trained On [97.3414396208613]
We show that a $n$-gram based membership definition can be effectively gamed. We show that it is difficult to find a single viable choice of $n$ for membership definitions. Our findings highlight the inadequacy of $n$-gram membership, suggesting membership definitions fail to account for auxiliary information.
arXiv Detail & Related papers (2025-03-21T19:57:04Z)
Zero-Shot Statistical Tests for LLM-Generated Text Detection using Finite Sample Concentration Inequalities [13.657259851747126]
Verifying the provenance of content is crucial to the function of many organizations, e.g., educational institutions, social media platforms, firms, etc. This problem is becoming increasingly challenging as text generated by Large Language Models (LLMs) becomes almost indistinguishable from human-generated content. We show that our tests' type I and type II errors decrease exponentially as text length increases. Practically, our work enables guaranteed finding of the origin of harmful or false LLM-generated text, which can be useful for combating misinformation and compliance with emerging AI regulations.
arXiv Detail & Related papers (2025-01-04T23:51:43Z)
Cross-validating causal discovery via Leave-One-Variable-Out [11.891940572224783]
We use the "Leave-One-Variable-Out (LOVO)" prediction where $Y$ is inferred from $X$ without any joint observations of $X$ and $Y$. We demonstrate that causal models on the two subsets, in the form of Acyclic Directed Mixed Graphs (ADMGs), often entail conclusions on the dependencies between $X$ and $Y$. The prediction error can then be estimated since the joint distribution $P(X, Y)$ is assumed to be available, and $X$ and $Y$ have only been omitted for the purpose of fal
arXiv Detail & Related papers (2024-11-08T15:15:34Z)
Reasoning Robustness of LLMs to Adversarial Typographical Errors [49.99118660264703]
Large Language Models (LLMs) have demonstrated impressive capabilities in reasoning using Chain-of-Thought (CoT) prompting. We study the reasoning robustness of LLMs to typographical errors, which can naturally occur in users' queries. We design an Adversarial Typo Attack ($texttATA$) algorithm that iteratively samples typos for words that are important to the query and selects the edit that is most likely to succeed in attacking.
arXiv Detail & Related papers (2024-11-08T05:54:05Z)
Great Memory, Shallow Reasoning: Limits of $k$NN-LMs [71.73611113995143]
$k$NN-LMs, which integrate retrieval with next-word prediction, have demonstrated strong performance in language modeling. We ask whether this improved ability to recall information really translates into downstream abilities.
arXiv Detail & Related papers (2024-08-21T17:59:05Z)
Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training [42.89066583603415]
This work identifies three critical $textitO$bstacles: lack of comprehensive evaluation, ($textitO$2) untested viability for scaling, and ($textitO$3) lack of empirical guidelines. We show that a depthwise stacking operator, called $G_textstack$, exhibits remarkable acceleration in training, leading to decreased loss and improved overall performance.
arXiv Detail & Related papers (2024-05-24T08:00:00Z)
Many-Shot Regurgitation (MSR) Prompting [26.9991760335222]
We introduce Many-Shot Regurgitation (MSR) prompting, a new black-box membership inference attack framework for examining verbatim content reproduction in large language models (LLMs) MSR prompting involves dividing the input text into multiple segments and creating a single prompt that includes a series of faux conversation rounds between a user and a language model to elicit verbatim regurgitation. We apply MSR prompting to diverse text sources, including Wikipedia articles and open educational resources (OER) textbooks, which provide high-quality, factual content.
arXiv Detail & Related papers (2024-05-13T19:22:40Z)
Towards a Theoretical Understanding of the 'Reversal Curse' via Training Dynamics [45.69328374321502]
Auto-regressive large language models (LLMs) show impressive capacities to solve many complex reasoning tasks. LLMs fail to conclude '$B gets A$' during inference even if the two sentences are semantically identical. We theoretically analyze the reversal curse via the training dynamics of gradient descent for two auto-regressive models.
arXiv Detail & Related papers (2024-05-07T21:03:51Z)
Language models scale reliably with over-training and on downstream tasks [121.69867718185125]
Scaling laws are useful guides for derisking expensive training runs. However, there remain gaps between current studies and how language models are trained. In contrast, scaling laws mostly predict loss on inference, but models are usually compared on downstream task performance.
arXiv Detail & Related papers (2024-03-13T13:54:00Z)
Can Large Language Models Infer Causation from Correlation? [104.96351414570239]
We test the pure causal inference skills of large language models (LLMs) We formulate a novel task Corr2Cause, which takes a set of correlational statements and determines the causal relationship between the variables. We show that these models achieve almost close to random performance on the task.
arXiv Detail & Related papers (2023-06-09T12:09:15Z)
Explaining Emergent In-Context Learning as Kernel Regression [61.57151500616111]
Large language models (LLMs) have initiated a paradigm shift in transfer learning. In this paper, we investigate the reason why a transformer-based language model can accomplish in-context learning after pre-training. We find that during ICL, the attention and hidden features in LLMs match the behaviors of a kernel regression.
arXiv Detail & Related papers (2023-05-22T06:45:02Z)
Blessing of Class Diversity in Pre-training [54.335530406959435]
We prove that when the classes of the pre-training task are sufficiently diverse, pre-training can significantly improve the sample efficiency of downstream tasks. Our proof relies on a vector-form Rademacher complexity chain rule for composite function classes and a modified self-concordance condition.
arXiv Detail & Related papers (2022-09-07T20:10:12Z)
Mediated Uncoupled Learning: Learning Functions without Direct Input-output Correspondences [80.95776331769899]
We consider the task of predicting $Y$ from $X$ when we have no paired data of them. A naive approach is to predict $U$ from $X$ using $S_X$ and then $Y$ from $U$ using $S_Y$. We propose a new method that avoids predicting $U$ but directly learns $Y = f(X)$ by training $f(X)$ with $S_X$ to predict $h(U)$.
arXiv Detail & Related papers (2021-07-16T22:13:29Z)
proScript: Partially Ordered Scripts Generation via Pre-trained Language Models [49.03193243699244]
We demonstrate for the first time that pre-trained neural language models (LMs) can be finetuned to generate high-quality scripts. We collected a large (6.4k), crowdsourced partially ordered scripts (named proScript) Our experiments show that our models perform well (e.g., F1=75.7 in task (i)), illustrating a new approach to overcoming previous barriers to script collection.
arXiv Detail & Related papers (2021-04-16T17:35:10Z)
Faster Uncertainty Quantification for Inverse Problems with Conditional Normalizing Flows [0.9176056742068814]
In inverse problems, we often have data consisting of paired samples $(x,y)sim p_X,Y(x,y)$ where $y$ are partial observations of a physical system. We propose a two-step scheme, which makes use of normalizing flows and joint data to train a conditional generator $q_theta(x|y)$.
arXiv Detail & Related papers (2020-07-15T20:36:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.