Do large language models and humans have similar behaviors in causal
inference with script knowledge?
- URL: http://arxiv.org/abs/2311.07311v1
- Date: Mon, 13 Nov 2023 13:05:15 GMT
- Title: Do large language models and humans have similar behaviors in causal
inference with script knowledge?
- Authors: Xudong Hong, Margarita Ryzhova, Daniel Adrian Biondi and Vera Demberg
- Abstract summary: We study the processing of an event $B$ in a script-based story.
In our manipulation, event $A$ is stated, negated, or omitted in an earlier section of the text.
- Score: 13.140513796801915
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Recently, large pre-trained language models (LLMs) have demonstrated superior
language understanding abilities, including zero-shot causal reasoning.
However, it is unclear to what extent their capabilities are similar to human
ones. We here study the processing of an event $B$ in a script-based story,
which causally depends on a previous event $A$. In our manipulation, event $A$
is stated, negated, or omitted in an earlier section of the text. We first
conducted a self-paced reading experiment, which showed that humans exhibit
significantly longer reading times when causal conflicts exist ($\neg A
\rightarrow B$) than under logical conditions ($A \rightarrow B$). However,
reading times remain similar when cause A is not explicitly mentioned,
indicating that humans can easily infer event B from their script knowledge. We
then tested a variety of LLMs on the same data to check to what extent the
models replicate human behavior. Our experiments show that 1) only recent LLMs,
like GPT-3 or Vicuna, correlate with human behavior in the $\neg A \rightarrow
B$ condition. 2) Despite this correlation, all models still fail to predict
that $nil \rightarrow B$ is less surprising than $\neg A \rightarrow B$,
indicating that LLMs still have difficulties integrating script knowledge. Our
code and collected data set are available at
https://github.com/tony-hong/causal-script.
Related papers
- Zero-Shot Statistical Tests for LLM-Generated Text Detection using Finite Sample Concentration Inequalities [13.657259851747126]
We show that type I and type II errors for our tests decrease exponentially in the text length.
We show that if the string is generated by $A$, the log-perplexity of the string under $A$ converges to the average entropy of the string under $A$, except with an exponentially small probability in string length.
arXiv Detail & Related papers (2025-01-04T23:51:43Z) - Great Memory, Shallow Reasoning: Limits of $k$NN-LMs [71.73611113995143]
$k$NN-LMs, which integrate retrieval with next-word prediction, have demonstrated strong performance in language modeling.
We ask whether this improved ability to recall information really translates into downstream abilities.
arXiv Detail & Related papers (2024-08-21T17:59:05Z) - Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training [42.89066583603415]
This work identifies three critical $textitO$bstacles: lack of comprehensive evaluation, ($textitO$2) untested viability for scaling, and ($textitO$3) lack of empirical guidelines.
We show that a depthwise stacking operator, called $G_textstack$, exhibits remarkable acceleration in training, leading to decreased loss and improved overall performance.
arXiv Detail & Related papers (2024-05-24T08:00:00Z) - Many-Shot Regurgitation (MSR) Prompting [26.9991760335222]
We introduce Many-Shot Regurgitation (MSR) prompting, a new black-box membership inference attack framework for examining verbatim content reproduction in large language models (LLMs)
MSR prompting involves dividing the input text into multiple segments and creating a single prompt that includes a series of faux conversation rounds between a user and a language model to elicit verbatim regurgitation.
We apply MSR prompting to diverse text sources, including Wikipedia articles and open educational resources (OER) textbooks, which provide high-quality, factual content.
arXiv Detail & Related papers (2024-05-13T19:22:40Z) - Towards a Theoretical Understanding of the 'Reversal Curse' via Training Dynamics [45.69328374321502]
Auto-regressive large language models (LLMs) show impressive capacities to solve many complex reasoning tasks.
LLMs fail to conclude '$B gets A$' during inference even if the two sentences are semantically identical.
We theoretically analyze the reversal curse via the training dynamics of gradient descent for two auto-regressive models.
arXiv Detail & Related papers (2024-05-07T21:03:51Z) - Language models scale reliably with over-training and on downstream tasks [121.69867718185125]
Scaling laws are useful guides for derisking expensive training runs.
However, there remain gaps between current studies and how language models are trained.
In contrast, scaling laws mostly predict loss on inference, but models are usually compared on downstream task performance.
arXiv Detail & Related papers (2024-03-13T13:54:00Z) - Can Large Language Models Infer Causation from Correlation? [104.96351414570239]
We test the pure causal inference skills of large language models (LLMs)
We formulate a novel task Corr2Cause, which takes a set of correlational statements and determines the causal relationship between the variables.
We show that these models achieve almost close to random performance on the task.
arXiv Detail & Related papers (2023-06-09T12:09:15Z) - Explaining Emergent In-Context Learning as Kernel Regression [61.57151500616111]
Large language models (LLMs) have initiated a paradigm shift in transfer learning.
In this paper, we investigate the reason why a transformer-based language model can accomplish in-context learning after pre-training.
We find that during ICL, the attention and hidden features in LLMs match the behaviors of a kernel regression.
arXiv Detail & Related papers (2023-05-22T06:45:02Z) - Mediated Uncoupled Learning: Learning Functions without Direct
Input-output Correspondences [80.95776331769899]
We consider the task of predicting $Y$ from $X$ when we have no paired data of them.
A naive approach is to predict $U$ from $X$ using $S_X$ and then $Y$ from $U$ using $S_Y$.
We propose a new method that avoids predicting $U$ but directly learns $Y = f(X)$ by training $f(X)$ with $S_X$ to predict $h(U)$.
arXiv Detail & Related papers (2021-07-16T22:13:29Z) - proScript: Partially Ordered Scripts Generation via Pre-trained Language
Models [49.03193243699244]
We demonstrate for the first time that pre-trained neural language models (LMs) can be finetuned to generate high-quality scripts.
We collected a large (6.4k), crowdsourced partially ordered scripts (named proScript)
Our experiments show that our models perform well (e.g., F1=75.7 in task (i)), illustrating a new approach to overcoming previous barriers to script collection.
arXiv Detail & Related papers (2021-04-16T17:35:10Z) - Faster Uncertainty Quantification for Inverse Problems with Conditional
Normalizing Flows [0.9176056742068814]
In inverse problems, we often have data consisting of paired samples $(x,y)sim p_X,Y(x,y)$ where $y$ are partial observations of a physical system.
We propose a two-step scheme, which makes use of normalizing flows and joint data to train a conditional generator $q_theta(x|y)$.
arXiv Detail & Related papers (2020-07-15T20:36:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.