Lower Perplexity is Not Always Human-Like
- URL: http://arxiv.org/abs/2106.01229v1
- Date: Wed, 2 Jun 2021 15:27:29 GMT
- Title: Lower Perplexity is Not Always Human-Like
- Authors: Tatsuki Kuribayashi, Yohei Oseki, Takumi Ito, Ryo Yoshida, Masayuki
Asahara, Kentaro Inui
- Abstract summary: We re-examine an established generalization -- the lower perplexity a language model has, the more human-like the language model is -- in Japanese.
Our experiments demonstrate that this established generalization exhibits a surprising lack of universality.
Our results suggest that a cross-lingual evaluation will be necessary to construct human-like computational models.
- Score: 25.187238589433385
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In computational psycholinguistics, various language models have been
evaluated against human reading behavior (e.g., eye movement) to build
human-like computational models. However, most previous efforts have focused
almost exclusively on English, despite the recent trend towards linguistic
universal within the general community. In order to fill the gap, this paper
investigates whether the established results in computational psycholinguistics
can be generalized across languages. Specifically, we re-examine an established
generalization -- the lower perplexity a language model has, the more
human-like the language model is -- in Japanese with typologically different
structures from English. Our experiments demonstrate that this established
generalization exhibits a surprising lack of universality; namely, lower
perplexity is not always human-like. Moreover, this discrepancy between English
and Japanese is further explored from the perspective of (non-)uniform
information density. Overall, our results suggest that a cross-lingual
evaluation will be necessary to construct human-like computational models.
Related papers
- On the Proper Treatment of Tokenization in Psycholinguistics [53.960910019072436]
The paper argues that token-level language models should be marginalized into character-level language models before they are used in psycholinguistic studies.
We find various focal areas whose surprisal is a better psychometric predictor than the surprisal of the region of interest itself.
arXiv Detail & Related papers (2024-10-03T17:18:03Z) - HLB: Benchmarking LLMs' Humanlikeness in Language Use [2.438748974410787]
We present a comprehensive humanlikeness benchmark (HLB) evaluating 20 large language models (LLMs)
We collected responses from over 2,000 human participants and compared them to outputs from the LLMs in these experiments.
Our results reveal fine-grained differences in how well LLMs replicate human responses across various linguistic levels.
arXiv Detail & Related papers (2024-09-24T09:02:28Z) - The Role of Language Imbalance in Cross-lingual Generalisation: Insights from Cloned Language Experiments [57.273662221547056]
In this study, we investigate an unintuitive novel driver of cross-lingual generalisation: language imbalance.
We observe that the existence of a predominant language during training boosts the performance of less frequent languages.
As we extend our analysis to real languages, we find that infrequent languages still benefit from frequent ones, yet whether language imbalance causes cross-lingual generalisation there is not conclusive.
arXiv Detail & Related papers (2024-04-11T17:58:05Z) - UNcommonsense Reasoning: Abductive Reasoning about Uncommon Situations [62.71847873326847]
We investigate the ability to model unusual, unexpected, and unlikely situations.
Given a piece of context with an unexpected outcome, this task requires reasoning abductively to generate an explanation.
We release a new English language corpus called UNcommonsense.
arXiv Detail & Related papers (2023-11-14T19:00:55Z) - Comparing Biases and the Impact of Multilingual Training across Multiple
Languages [70.84047257764405]
We present a bias analysis across Italian, Chinese, English, Hebrew, and Spanish on the downstream sentiment analysis task.
We adapt existing sentiment bias templates in English to Italian, Chinese, Hebrew, and Spanish for four attributes: race, religion, nationality, and gender.
Our results reveal similarities in bias expression such as favoritism of groups that are dominant in each language's culture.
arXiv Detail & Related papers (2023-05-18T18:15:07Z) - A fine-grained comparison of pragmatic language understanding in humans
and language models [2.231167375820083]
We compare language models and humans on seven pragmatic phenomena.
We find that the largest models achieve high accuracy and match human error patterns.
Preliminary evidence that models and humans are sensitive to similar linguistic cues.
arXiv Detail & Related papers (2022-12-13T18:34:59Z) - An Analysis of Social Biases Present in BERT Variants Across Multiple
Languages [0.0]
We investigate the bias present in monolingual BERT models across a diverse set of languages.
We propose a template-based method to measure any kind of bias, based on sentence pseudo-likelihood.
We conclude that current methods of probing for bias are highly language-dependent.
arXiv Detail & Related papers (2022-11-25T23:38:08Z) - Every word counts: A multilingual analysis of individual human alignment
with model attention [7.066531752554282]
We analyse eye-tracking data from speakers of 13 different languages reading both in their native language (L1) and in English as language learners (L2)
We find considerable differences between languages but also that individual reading behaviour such as skipping rate, total reading time and vocabulary knowledge (LexTALE) influence the alignment between humans and models.
arXiv Detail & Related papers (2022-10-05T12:44:35Z) - Same Neurons, Different Languages: Probing Morphosyntax in Multilingual
Pre-trained Models [84.86942006830772]
We conjecture that multilingual pre-trained models can derive language-universal abstractions about grammar.
We conduct the first large-scale empirical study over 43 languages and 14 morphosyntactic categories with a state-of-the-art neuron-level probe.
arXiv Detail & Related papers (2022-05-04T12:22:31Z) - Towards Zero-shot Language Modeling [90.80124496312274]
We construct a neural model that is inductively biased towards learning human languages.
We infer this distribution from a sample of typologically diverse training languages.
We harness additional language-specific side information as distant supervision for held-out languages.
arXiv Detail & Related papers (2021-08-06T23:49:18Z) - Uncovering Constraint-Based Behavior in Neural Models via Targeted
Fine-Tuning [9.391375268580806]
We show that competing linguistic processes within a language obscure underlying linguistic knowledge.
While human behavior has been found to be similar across languages, we find cross-linguistic variation in model behavior.
Our results suggest that models need to learn both the linguistic constraints in a language and their relative ranking, with mismatches in either producing non-human-like behavior.
arXiv Detail & Related papers (2021-06-02T14:52:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.