On Reality and the Limits of Language Data: Aligning LLMs with Human
Norms
- URL: http://arxiv.org/abs/2208.11981v2
- Date: Tue, 9 May 2023 12:36:54 GMT
- Title: On Reality and the Limits of Language Data: Aligning LLMs with Human
Norms
- Authors: Nigel H. Collier, Fangyu Liu and Ehsan Shareghi
- Abstract summary: Large Language Models (LLMs) harness linguistic associations in vast natural language data for practical applications.
We explore this question using a novel and tightly controlled reasoning test (ART) and compare human norms against versions of GPT-3.
Our findings highlight the categories of common-sense relations models that could learn directly from data and areas of weakness.
- Score: 10.02997544238235
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advancements in Large Language Models (LLMs) harness linguistic
associations in vast natural language data for practical applications. However,
their ability to understand the physical world using only language data remains
a question. After reviewing existing protocols, we explore this question using
a novel and tightly controlled reasoning test (ART) and compare human norms
against versions of GPT-3. Our findings highlight the categories of
common-sense relations models that could learn directly from data and areas of
weakness. GPT-3 offers evidence for verbal reasoning on a par with human
subjects for several relations including Synonymy, Antonymy, and Default
inheritance, Without reinforcement learning from human judgements, it appears
GPT-3 performs at the lower end of the reference interval for Has-part and
Contained-in. Weaknesses were observed also in affordance characteristics
through Necessary-quality, Order-of-size and Order-of-intensity. Combining LLMs
with symbolic world grounding is a promising direction to address associative
learning.
Related papers
- Mitigating Biases to Embrace Diversity: A Comprehensive Annotation Benchmark for Toxic Language [0.0]
This study introduces a prescriptive annotation benchmark grounded in humanities research to ensure consistent, unbiased labeling of offensive language.
We contribute two newly annotated datasets that achieve higher inter-annotator agreement between human and language model (LLM) annotations.
arXiv Detail & Related papers (2024-10-17T08:10:24Z) - From Babbling to Fluency: Evaluating the Evolution of Language Models in Terms of Human Language Acquisition [6.617999710257379]
We propose a three-stage framework to assess the abilities of LMs.
We evaluate the generative capacities of LMs using methods from linguistic research.
arXiv Detail & Related papers (2024-10-17T06:31:49Z) - One Language, Many Gaps: Evaluating Dialect Fairness and Robustness of Large Language Models in Reasoning Tasks [55.35278531907263]
We present the first study on Large Language Models' fairness and robustness to a dialect in canonical reasoning tasks.
We hire AAVE speakers to rewrite seven popular benchmarks, such as HumanEval and GSM8K.
We find that, compared to Standardized English, almost all of these widely used models show significant brittleness and unfairness to queries in AAVE.
arXiv Detail & Related papers (2024-10-14T18:44:23Z) - Evaluating Neural Language Models as Cognitive Models of Language
Acquisition [4.779196219827507]
We argue that some of the most prominent benchmarks for evaluating the syntactic capacities of neural language models may not be sufficiently rigorous.
When trained on small-scale data modeling child language acquisition, the LMs can be readily matched by simple baseline models.
We conclude with suggestions for better connecting LMs with the empirical study of child language acquisition.
arXiv Detail & Related papers (2023-10-31T00:16:17Z) - Syntax and Semantics Meet in the "Middle": Probing the Syntax-Semantics
Interface of LMs Through Agentivity [68.8204255655161]
We present the semantic notion of agentivity as a case study for probing such interactions.
This suggests LMs may potentially serve as more useful tools for linguistic annotation, theory testing, and discovery.
arXiv Detail & Related papers (2023-05-29T16:24:01Z) - Simple Linguistic Inferences of Large Language Models (LLMs): Blind Spots and Blinds [59.71218039095155]
We evaluate language understanding capacities on simple inference tasks that most humans find trivial.
We target (i) grammatically-specified entailments, (ii) premises with evidential adverbs of uncertainty, and (iii) monotonicity entailments.
The models exhibit moderate to low performance on these evaluation sets.
arXiv Detail & Related papers (2023-05-24T06:41:09Z) - Can LLMs Capture Human Preferences? [5.683832910692926]
We explore the viability of Large Language Models (LLMs) in emulating human survey respondents and eliciting preferences.
We compare responses from LLMs across various languages and compare them to human responses, exploring preferences between smaller, sooner, and larger, later rewards.
Our findings reveal that both GPT models demonstrate less patience than humans, with GPT-3.5 exhibiting a lexicographic preference for earlier rewards, unlike human decision-makers.
arXiv Detail & Related papers (2023-05-04T03:51:31Z) - We're Afraid Language Models Aren't Modeling Ambiguity [136.8068419824318]
Managing ambiguity is a key part of human language understanding.
We characterize ambiguity in a sentence by its effect on entailment relations with another sentence.
We show that a multilabel NLI model can flag political claims in the wild that are misleading due to ambiguity.
arXiv Detail & Related papers (2023-04-27T17:57:58Z) - Few-shot Named Entity Recognition with Cloze Questions [3.561183926088611]
We propose a simple and intuitive adaptation of Pattern-Exploiting Training (PET), a recent approach which combines the cloze-questions mechanism and fine-tuning for few-shot learning.
Our approach achieves considerably better performance than standard fine-tuning and comparable or improved results with respect to other few-shot baselines.
arXiv Detail & Related papers (2021-11-24T11:08:59Z) - Towards Language Modelling in the Speech Domain Using Sub-word
Linguistic Units [56.52704348773307]
We propose a novel LSTM-based generative speech LM based on linguistic units including syllables and phonemes.
With a limited dataset, orders of magnitude smaller than that required by contemporary generative models, our model closely approximates babbling speech.
We show the effect of training with auxiliary text LMs, multitask learning objectives, and auxiliary articulatory features.
arXiv Detail & Related papers (2021-10-31T22:48:30Z) - ERICA: Improving Entity and Relation Understanding for Pre-trained
Language Models via Contrastive Learning [97.10875695679499]
We propose a novel contrastive learning framework named ERICA in pre-training phase to obtain a deeper understanding of the entities and their relations in text.
Experimental results demonstrate that our proposed ERICA framework achieves consistent improvements on several document-level language understanding tasks.
arXiv Detail & Related papers (2020-12-30T03:35:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.