Reasoning about Ambiguous Definite Descriptions
- URL: http://arxiv.org/abs/2310.14657v1
- Date: Mon, 23 Oct 2023 07:52:38 GMT
- Title: Reasoning about Ambiguous Definite Descriptions
- Authors: Stefan F. Schouten, Peter Bloem, Ilia Markov, Piek Vossen
- Abstract summary: Natural language reasoning plays an important role in improving language models' ability to solve complex language understanding tasks.
No resources exist to evaluate how well Large Language Models can use explicit reasoning to resolve ambiguity in language.
We propose to use ambiguous definite descriptions for this purpose and create and publish the first benchmark dataset consisting of such phrases.
- Score: 2.5398014196797605
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Natural language reasoning plays an increasingly important role in improving
language models' ability to solve complex language understanding tasks. An
interesting use case for reasoning is the resolution of context-dependent
ambiguity. But no resources exist to evaluate how well Large Language Models
can use explicit reasoning to resolve ambiguity in language. We propose to use
ambiguous definite descriptions for this purpose and create and publish the
first benchmark dataset consisting of such phrases. Our method includes all
information required to resolve the ambiguity in the prompt, which means a
model does not require anything but reasoning to do well. We find this to be a
challenging task for recent LLMs. Code and data available at:
https://github.com/sfschouten/exploiting-ambiguity
Related papers
- Understanding and Mitigating Language Confusion in LLMs [76.96033035093204]
We evaluate 15 typologically diverse languages with existing and newly-created English and multilingual prompts.
We find that Llama Instruct and Mistral models exhibit high degrees of language confusion.
We find that language confusion can be partially mitigated via few-shot prompting, multilingual SFT and preference tuning.
arXiv Detail & Related papers (2024-06-28T17:03:51Z) - Zero and Few-shot Semantic Parsing with Ambiguous Inputs [45.285508941560295]
We introduce AmP, a framework, dataset, and challenge for translating ambiguous natural language to formal representations like logic and code.
Using AmP, we investigate how several few-shot text-to-code systems handle ambiguity, introducing three new metrics.
We find that large pre-trained models perform poorly at capturing the distribution of possible meanings without deliberate instruction.
arXiv Detail & Related papers (2023-06-01T15:46:36Z) - We're Afraid Language Models Aren't Modeling Ambiguity [136.8068419824318]
Managing ambiguity is a key part of human language understanding.
We characterize ambiguity in a sentence by its effect on entailment relations with another sentence.
We show that a multilabel NLI model can flag political claims in the wild that are misleading due to ambiguity.
arXiv Detail & Related papers (2023-04-27T17:57:58Z) - Large Language Models Can Be Easily Distracted by Irrelevant Context [29.315230178997002]
We investigate how the model problem-solving accuracy can be influenced by irrelevant context.
We use benchmark to measure the distractibility of cutting-edge prompting techniques for large language models.
arXiv Detail & Related papers (2023-01-31T20:48:57Z) - Python Code Generation by Asking Clarification Questions [57.63906360576212]
In this work, we introduce a novel and more realistic setup for this task.
We hypothesize that the under-specification of a natural language description can be resolved by asking clarification questions.
We collect and introduce a new dataset named CodeClarQA containing pairs of natural language descriptions and code with created synthetic clarification questions and answers.
arXiv Detail & Related papers (2022-12-19T22:08:36Z) - APOLLO: A Simple Approach for Adaptive Pretraining of Language Models
for Logical Reasoning [73.3035118224719]
We propose APOLLO, an adaptively pretrained language model that has improved logical reasoning abilities.
APOLLO performs comparably on ReClor and outperforms baselines on LogiQA.
arXiv Detail & Related papers (2022-12-19T07:40:02Z) - The Goldilocks of Pragmatic Understanding: Fine-Tuning Strategy Matters
for Implicature Resolution by LLMs [26.118193748582197]
We evaluate four categories of widely used state-of-the-art models.
We find that, despite only evaluating on utterances that require a binary inference, models in three of these categories perform close to random.
These results suggest that certain fine-tuning strategies are far better at inducing pragmatic understanding in models.
arXiv Detail & Related papers (2022-10-26T19:04:23Z) - Transparency Helps Reveal When Language Models Learn Meaning [71.96920839263457]
Our systematic experiments with synthetic data reveal that, with languages where all expressions have context-independent denotations, both autoregressive and masked language models learn to emulate semantic relations between expressions.
Turning to natural language, our experiments with a specific phenomenon -- referential opacity -- add to the growing body of evidence that current language models do not well-represent natural language semantics.
arXiv Detail & Related papers (2022-10-14T02:35:19Z) - Provable Limitations of Acquiring Meaning from Ungrounded Form: What
will Future Language Models Understand? [87.20342701232869]
We investigate the abilities of ungrounded systems to acquire meaning.
We study whether assertions enable a system to emulate representations preserving semantic relations like equivalence.
We find that assertions enable semantic emulation if all expressions in the language are referentially transparent.
However, if the language uses non-transparent patterns like variable binding, we show that emulation can become an uncomputable problem.
arXiv Detail & Related papers (2021-04-22T01:00:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.