Do language models learn typicality judgments from text?
- URL: http://arxiv.org/abs/2105.02987v1
- Date: Thu, 6 May 2021 21:56:40 GMT
- Title: Do language models learn typicality judgments from text?
- Authors: Kanishka Misra and Allyson Ettinger and Julia Taylor Rayz
- Abstract summary: We evaluate predictive language models (LMs) on a prevalent phenomenon in cognitive science: typicality.
Our first test targets whether typicality modulates LMs in assigning taxonomic category memberships to items.
The second test investigates sensitivities to typicality in LMs' probabilities when extending new information about items to their categories.
- Score: 6.252236971703546
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Building on research arguing for the possibility of conceptual and
categorical knowledge acquisition through statistics contained in language, we
evaluate predictive language models (LMs) -- informed solely by textual input
-- on a prevalent phenomenon in cognitive science: typicality. Inspired by
experiments that involve language processing and show robust typicality effects
in humans, we propose two tests for LMs. Our first test targets whether
typicality modulates LM probabilities in assigning taxonomic category
memberships to items. The second test investigates sensitivities to typicality
in LMs' probabilities when extending new information about items to their
categories. Both tests show modest -- but not completely absent --
correspondence between LMs and humans, suggesting that text-based exposure
alone is insufficient to acquire typicality knowledge.
Related papers
- Large Language Models as Neurolinguistic Subjects: Identifying Internal Representations for Form and Meaning [49.60849499134362]
This study investigates the linguistic understanding of Large Language Models (LLMs) regarding signifier (form) and signified (meaning)
Traditional psycholinguistic evaluations often reflect statistical biases that may misrepresent LLMs' true linguistic capabilities.
We introduce a neurolinguistic approach, utilizing a novel method that combines minimal pair and diagnostic probing to analyze activation patterns across model layers.
arXiv Detail & Related papers (2024-11-12T04:16:44Z) - From Babbling to Fluency: Evaluating the Evolution of Language Models in Terms of Human Language Acquisition [6.617999710257379]
We propose a three-stage framework to assess the abilities of LMs.
We evaluate the generative capacities of LMs using methods from linguistic research.
arXiv Detail & Related papers (2024-10-17T06:31:49Z) - Holmes: A Benchmark to Assess the Linguistic Competence of Language Models [59.627729608055006]
We introduce Holmes, a new benchmark designed to assess language models (LMs) linguistic competence.
We use computation-based probing to examine LMs' internal representations regarding distinct linguistic phenomena.
As a result, we meet recent calls to disentangle LMs' linguistic competence from other cognitive abilities.
arXiv Detail & Related papers (2024-04-29T17:58:36Z) - Taxonomy-based CheckList for Large Language Model Evaluation [0.0]
We introduce human knowledge into natural language interventions and study pre-trained language models' (LMs) behaviors.
Inspired by CheckList behavioral testing, we present a checklist-style task that aims to probe and quantify LMs' unethical behaviors.
arXiv Detail & Related papers (2023-12-15T12:58:07Z) - Evaluating Neural Language Models as Cognitive Models of Language
Acquisition [4.779196219827507]
We argue that some of the most prominent benchmarks for evaluating the syntactic capacities of neural language models may not be sufficiently rigorous.
When trained on small-scale data modeling child language acquisition, the LMs can be readily matched by simple baseline models.
We conclude with suggestions for better connecting LMs with the empirical study of child language acquisition.
arXiv Detail & Related papers (2023-10-31T00:16:17Z) - Disco-Bench: A Discourse-Aware Evaluation Benchmark for Language
Modelling [70.23876429382969]
We propose a benchmark that can evaluate intra-sentence discourse properties across a diverse set of NLP tasks.
Disco-Bench consists of 9 document-level testsets in the literature domain, which contain rich discourse phenomena.
For linguistic analysis, we also design a diagnostic test suite that can examine whether the target models learn discourse knowledge.
arXiv Detail & Related papers (2023-07-16T15:18:25Z) - Simple Linguistic Inferences of Large Language Models (LLMs): Blind Spots and Blinds [59.71218039095155]
We evaluate language understanding capacities on simple inference tasks that most humans find trivial.
We target (i) grammatically-specified entailments, (ii) premises with evidential adverbs of uncertainty, and (iii) monotonicity entailments.
The models exhibit moderate to low performance on these evaluation sets.
arXiv Detail & Related papers (2023-05-24T06:41:09Z) - Counteracts: Testing Stereotypical Representation in Pre-trained
Language Models [4.211128681972148]
We use counterexamples to examine the internal stereotypical knowledge in pre-trained language models (PLMs)
We evaluate 7 PLMs on 9 types of cloze-style prompt with different information and base knowledge.
arXiv Detail & Related papers (2023-01-11T07:52:59Z) - Detecting Text Formality: A Study of Text Classification Approaches [78.11745751651708]
This work proposes the first to our knowledge systematic study of formality detection methods based on statistical, neural-based, and Transformer-based machine learning methods.
We conducted three types of experiments -- monolingual, multilingual, and cross-lingual.
The study shows the overcome of Char BiLSTM model over Transformer-based ones for the monolingual and multilingual formality classification task.
arXiv Detail & Related papers (2022-04-19T16:23:07Z) - A Latent-Variable Model for Intrinsic Probing [93.62808331764072]
We propose a novel latent-variable formulation for constructing intrinsic probes.
We find empirical evidence that pre-trained representations develop a cross-lingually entangled notion of morphosyntax.
arXiv Detail & Related papers (2022-01-20T15:01:12Z) - Can Transformer Language Models Predict Psychometric Properties? [0.0]
Transformer-based language models (LMs) continue to advance state-of-the-art performance on NLP benchmark tasks.
Can LMs be of use in predicting what the psychometric properties of test items will be when those items are given to human participants?
We gather responses from numerous human participants and LMs on a broad diagnostic test of linguistic competencies.
arXiv Detail & Related papers (2021-06-12T20:05:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.