Not wacky vs. definitely wacky: A study of scalar adverbs in pretrained
language models
- URL: http://arxiv.org/abs/2305.16426v2
- Date: Sun, 22 Oct 2023 17:06:35 GMT
- Title: Not wacky vs. definitely wacky: A study of scalar adverbs in pretrained
language models
- Authors: Isabelle Lorge and Janet Pierrehumbert
- Abstract summary: Modern pretrained language models, such as BERT, RoBERTa and GPT-3 hold the promise of performing better on logical tasks than classic static word embeddings.
We investigate the extent to which BERT, RoBERTa, GPT-2 and GPT-3 exhibit general, human-like, knowledge of these common words.
We find that despite capturing some aspects of logical meaning, the models fall far short of human performance.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Vector space models of word meaning all share the assumption that words
occurring in similar contexts have similar meanings. In such models, words that
are similar in their topical associations but differ in their logical force
tend to emerge as semantically close, creating well-known challenges for NLP
applications that involve logical reasoning. Modern pretrained language models,
such as BERT, RoBERTa and GPT-3 hold the promise of performing better on
logical tasks than classic static word embeddings. However, reports are mixed
about their success. In the current paper, we advance this discussion through a
systematic study of scalar adverbs, an under-explored class of words with
strong logical force. Using three different tasks, involving both naturalistic
social media data and constructed examples, we investigate the extent to which
BERT, RoBERTa, GPT-2 and GPT-3 exhibit general, human-like, knowledge of these
common words. We ask: 1) Do the models distinguish amongst the three semantic
categories of MODALITY, FREQUENCY and DEGREE? 2) Do they have implicit
representations of full scales from maximally negative to maximally positive?
3) How do word frequency and contextual factors impact model performance? We
find that despite capturing some aspects of logical meaning, the models fall
far short of human performance.
Related papers
- Uncovering Autoregressive LLM Knowledge of Thematic Fit in Event Representation [0.09558392439655014]
We assess whether pre-trained autoregressive LLMs possess consistent, expressible knowledge about thematic fit.
We evaluate both closed and open state-of-the-art LLMs on several psycholinguistic datasets.
Our results show that chain-of-thought reasoning is more effective on datasets with self-explanatory semantic role labels.
arXiv Detail & Related papers (2024-10-19T18:25:30Z) - Learning the meanings of function words from grounded language using a visual question answering model [28.10687343493772]
We show that recent neural-network based visual question answering models can learn to use function words as part of answering questions about complex visual scenes.
We find that these models can learn the meanings of logical connectives and and or without any prior knowledge of logical reasoning.
Our findings offer proof-of-concept evidence that it is possible to learn the nuanced interpretations of function words in visually grounded context.
arXiv Detail & Related papers (2023-08-16T18:53:39Z) - Syntax and Semantics Meet in the "Middle": Probing the Syntax-Semantics
Interface of LMs Through Agentivity [68.8204255655161]
We present the semantic notion of agentivity as a case study for probing such interactions.
This suggests LMs may potentially serve as more useful tools for linguistic annotation, theory testing, and discovery.
arXiv Detail & Related papers (2023-05-29T16:24:01Z) - Simple Linguistic Inferences of Large Language Models (LLMs): Blind Spots and Blinds [59.71218039095155]
We evaluate language understanding capacities on simple inference tasks that most humans find trivial.
We target (i) grammatically-specified entailments, (ii) premises with evidential adverbs of uncertainty, and (iii) monotonicity entailments.
The models exhibit moderate to low performance on these evaluation sets.
arXiv Detail & Related papers (2023-05-24T06:41:09Z) - Transparency Helps Reveal When Language Models Learn Meaning [71.96920839263457]
Our systematic experiments with synthetic data reveal that, with languages where all expressions have context-independent denotations, both autoregressive and masked language models learn to emulate semantic relations between expressions.
Turning to natural language, our experiments with a specific phenomenon -- referential opacity -- add to the growing body of evidence that current language models do not well-represent natural language semantics.
arXiv Detail & Related papers (2022-10-14T02:35:19Z) - Multi-sense embeddings through a word sense disambiguation process [2.2344764434954256]
Most Suitable Sense.
(MSSA) disambiguates and annotates each word by its specific sense, considering the semantic effects of its context.
We test our approach on six different benchmarks for the word similarity task, showing that our approach can produce state-of-the-art results.
arXiv Detail & Related papers (2021-01-21T16:22:34Z) - My Teacher Thinks The World Is Flat! Interpreting Automatic Essay
Scoring Mechanism [71.34160809068996]
Recent work shows that automated scoring systems are prone to even common-sense adversarial samples.
We utilize recent advances in interpretability to find the extent to which features such as coherence, content and relevance are important for automated scoring mechanisms.
We also find that since the models are not semantically grounded with world-knowledge and common sense, adding false facts such as the world is flat'' actually increases the score instead of decreasing it.
arXiv Detail & Related papers (2020-12-27T06:19:20Z) - A Closer Look at Linguistic Knowledge in Masked Language Models: The
Case of Relative Clauses in American English [17.993417004424078]
Transformer-based language models achieve high performance on various tasks, but we still lack understanding of the kind of linguistic knowledge they learn and rely on.
We evaluate three models (BERT, RoBERTa, and ALBERT) testing their grammatical and semantic knowledge by sentence-level probing, diagnostic cases, and masked prediction tasks.
arXiv Detail & Related papers (2020-11-02T13:25:39Z) - Probing Pretrained Language Models for Lexical Semantics [76.73599166020307]
We present a systematic empirical analysis across six typologically diverse languages and five different lexical tasks.
Our results indicate patterns and best practices that hold universally, but also point to prominent variations across languages and tasks.
arXiv Detail & Related papers (2020-10-12T14:24:01Z) - Investigating Cross-Linguistic Adjective Ordering Tendencies with a
Latent-Variable Model [66.84264870118723]
We present the first purely corpus-driven model of multi-lingual adjective ordering in the form of a latent-variable model.
We provide strong converging evidence for the existence of universal, cross-linguistic, hierarchical adjective ordering tendencies.
arXiv Detail & Related papers (2020-10-09T18:27:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.