MiQA: A Benchmark for Inference on Metaphorical Questions
- URL: http://arxiv.org/abs/2210.07993v1
- Date: Fri, 14 Oct 2022 17:46:05 GMT
- Title: MiQA: A Benchmark for Inference on Metaphorical Questions
- Authors: Iulia-Maria Comsa, Julian Martin Eisenschlos, Srini Narayanan
- Abstract summary: We propose a benchmark to assess the capability of large language models to reason with conventional metaphors.
We examine the performance of state-of-the-art pre-trained models on binary-choice tasks.
- Score: 5.32836690371986
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose a benchmark to assess the capability of large language models to
reason with conventional metaphors. Our benchmark combines the previously
isolated topics of metaphor detection and commonsense reasoning into a single
task that requires a model to make inferences by accurately selecting between
the literal and metaphorical register. We examine the performance of
state-of-the-art pre-trained models on binary-choice tasks and find a large
discrepancy between the performance of small and very large models, going from
chance to near-human level. We also analyse the largest model in a generative
setting and find that although human performance is approached, careful
multiple-shot prompting is required.
Related papers
- Context versus Prior Knowledge in Language Models [49.17879668110546]
Language models often need to integrate prior knowledge learned during pretraining and new information presented in context.
We propose two mutual information-based metrics to measure a model's dependency on a context and on its prior about an entity.
arXiv Detail & Related papers (2024-04-06T13:46:53Z) - On the Tip of the Tongue: Analyzing Conceptual Representation in Large
Language Models with Reverse-Dictionary Probe [36.65834065044746]
We use in-context learning to guide the models to generate the term for an object concept implied in a linguistic description.
Experiments suggest that conceptual inference ability as probed by the reverse-dictionary task predicts model's general reasoning performance.
arXiv Detail & Related papers (2024-02-22T09:45:26Z) - Topics in the Haystack: Extracting and Evaluating Topics beyond
Coherence [0.0]
We propose a method that incorporates a deeper understanding of both sentence and document themes.
This allows our model to detect latent topics that may include uncommon words or neologisms.
We present correlation coefficients with human identification of intruder words and achieve near-human level results at the word-intrusion task.
arXiv Detail & Related papers (2023-03-30T12:24:25Z) - Task Ambiguity in Humans and Language Models [7.033374427612259]
We propose AmbiBench, a new benchmark of ambiguously-specified classification tasks.
We evaluate humans and models on AmbiBench by seeing how well they identify the intended task.
We show how to dramatically improve the accuracy of language models trained without large-scale human feedback training.
arXiv Detail & Related papers (2022-12-20T18:35:33Z) - Don't Copy the Teacher: Data and Model Challenges in Embodied Dialogue [92.01165203498299]
Embodied dialogue instruction following requires an agent to complete a complex sequence of tasks from a natural language exchange.
This paper argues that imitation learning (IL) and related low-level metrics are actually misleading and do not align with the goals of embodied dialogue research.
arXiv Detail & Related papers (2022-10-10T05:51:40Z) - Visual Comparison of Language Model Adaptation [55.92129223662381]
adapters are lightweight alternatives for model adaptation.
In this paper, we discuss several design and alternatives for interactive, comparative visual explanation methods.
We show that, for instance, an adapter trained on the language debiasing task according to context-0 embeddings introduces a new type of bias.
arXiv Detail & Related papers (2022-08-17T09:25:28Z) - Exploring Multi-Modal Representations for Ambiguity Detection &
Coreference Resolution in the SIMMC 2.0 Challenge [60.616313552585645]
We present models for effective Ambiguity Detection and Coreference Resolution in Conversational AI.
Specifically, we use TOD-BERT and LXMERT based models, compare them to a number of baselines and provide ablation experiments.
Our results show that (1) language models are able to exploit correlations in the data to detect ambiguity; and (2) unimodal coreference resolution models can avoid the need for a vision component.
arXiv Detail & Related papers (2022-02-25T12:10:02Z) - Thematic fit bits: Annotation quality and quantity for event participant
representation [0.0]
Modeling thematic fit (a verb--argument compositional semantics task) currently requires a very large burden of data.
We take a high-performing neural approach to modeling verb--argument fit, previously trained on a linguistically machine-annotated large corpus, and replace corpus layers with output from higher-quality taggers.
arXiv Detail & Related papers (2021-05-13T06:13:44Z) - Probing Task-Oriented Dialogue Representation from Language Models [106.02947285212132]
This paper investigates pre-trained language models to find out which model intrinsically carries the most informative representation for task-oriented dialogue tasks.
We fine-tune a feed-forward layer as the classifier probe on top of a fixed pre-trained language model with annotated labels in a supervised way.
arXiv Detail & Related papers (2020-10-26T21:34:39Z) - How Far are We from Effective Context Modeling? An Exploratory Study on
Semantic Parsing in Context [59.13515950353125]
We present a grammar-based decoding semantic parsing and adapt typical context modeling methods on top of it.
We evaluate 13 context modeling methods on two large cross-domain datasets, and our best model achieves state-of-the-art performances.
arXiv Detail & Related papers (2020-02-03T11:28:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.