Related papers: SemEval-2025 Task 1: AdMIRe -- Advancing Multimodal Idiomaticity Representation

SemEval-2025 Task 1: AdMIRe -- Advancing Multimodal Idiomaticity Representation

URL: http://arxiv.org/abs/2503.15358v3
Date: Wed, 04 Jun 2025 10:58:56 GMT
Title: SemEval-2025 Task 1: AdMIRe -- Advancing Multimodal Idiomaticity Representation
Authors: Thomas Pickard, Aline Villavicencio, Maggie Mi, Wei He, Dylan Phelps, Marco Idiart,
Abstract summary: We present datasets and tasks for SemEval-2025 Task 1: AdReMiancing Multimodality Representation.<n>This challenge challenges the community to assess and improve models' ability to interpret idiomatic expressions in multimodal contexts and in multiple languages.<n>Participants competed in two subtasks: ranking images based on their alignment with idiomatic or literal meanings, semantic and predicting the next image in a sequence.
Score: 4.9231093174636404
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Idiomatic expressions present a unique challenge in NLP, as their meanings are often not directly inferable from their constituent words. Despite recent advancements in Large Language Models (LLMs), idiomaticity remains a significant obstacle to robust semantic representation. We present datasets and tasks for SemEval-2025 Task 1: AdMiRe (Advancing Multimodal Idiomaticity Representation), which challenges the community to assess and improve models' ability to interpret idiomatic expressions in multimodal contexts and in multiple languages. Participants competed in two subtasks: ranking images based on their alignment with idiomatic or literal meanings, and predicting the next image in a sequence. The most effective methods achieved human-level performance by leveraging pretrained LLMs and vision-language models in mixture-of-experts settings, with multiple queries used to smooth over the weaknesses in these models' representations of idiomaticity.

Related papers

MUCAR: Benchmarking Multilingual Cross-Modal Ambiguity Resolution for Multimodal Large Language Models [18.73221445082855]
Multimodal Large Language Models (MLLMs) have demonstrated significant advances across numerous vision-language tasks.<n>We introduce MUCAR, a novel benchmark designed explicitly for evaluating multimodal ambiguity resolution across multilingual and cross-modal scenarios.
arXiv Detail & Related papers (2025-06-20T14:57:41Z)
UoR-NCL at SemEval-2025 Task 1: Using Generative LLMs and CLIP Models for Multilingual Multimodal Idiomaticity Representation [4.830594923821009]
SemEval-2025 Task 1 focuses on ranking images based on their alignment with a given nominal compound.<n>This work uses generative large language models (LLMs) and multilingual CLIP models to enhance idiomatic compound representations.
arXiv Detail & Related papers (2025-02-28T11:52:02Z)
Unified Generative and Discriminative Training for Multi-modal Large Language Models [88.84491005030316]
Generative training has enabled Vision-Language Models (VLMs) to tackle various complex tasks. Discriminative training, exemplified by models like CLIP, excels in zero-shot image-text classification and retrieval. This paper proposes a unified approach that integrates the strengths of both paradigms.
arXiv Detail & Related papers (2024-11-01T01:51:31Z)
Meta-Task Prompting Elicits Embeddings from Large Language Models [54.757445048329735]
We introduce a new unsupervised text embedding method, Meta-Task Prompting with Explicit One-Word Limitation. We generate high-quality sentence embeddings from Large Language Models without the need for model fine-tuning. Our findings suggest a new scaling law, offering a versatile and resource-efficient approach for embedding generation across diverse scenarios.
arXiv Detail & Related papers (2024-02-28T16:35:52Z)
OCHADAI at SemEval-2022 Task 2: Adversarial Training for Multilingual Idiomaticity Detection [4.111899441919165]
We propose a multilingual adversarial training model for determining whether a sentence contains an idiomatic expression. Our model relies on pre-trained contextual representations from different multi-lingual state-of-the-art transformer-based language models.
arXiv Detail & Related papers (2022-06-07T05:52:43Z)
AStitchInLanguageModels: Dataset and Methods for the Exploration of Idiomaticity in Pre-Trained Language Models [7.386862225828819]
This work presents a novel dataset of naturally occurring sentences containing MWEs manually classified into a fine-grained set of meanings. We use this dataset in two tasks designed to test i) a language model's ability to detect idiom usage, and ii) the effectiveness of a language model in generating representations of sentences containing idioms.
arXiv Detail & Related papers (2021-09-09T16:53:17Z)
Specializing Multilingual Language Models: An Empirical Study [50.7526245872855]
Contextualized word representations from pretrained multilingual language models have become the de facto standard for addressing natural language tasks. For languages rarely or never seen by these models, directly using such models often results in suboptimal representation or use of data.
arXiv Detail & Related papers (2021-06-16T18:13:55Z)
SLM: Learning a Discourse Language Representation with Sentence Unshuffling [53.42814722621715]
We introduce Sentence-level Language Modeling, a new pre-training objective for learning a discourse language representation. We show that this feature of our model improves the performance of the original BERT by large margins.
arXiv Detail & Related papers (2020-10-30T13:33:41Z)
Cross-lingual Spoken Language Understanding with Regularized Representation Alignment [71.53159402053392]
We propose a regularization approach to align word-level and sentence-level representations across languages without any external resource. Experiments on the cross-lingual spoken language understanding task show that our model outperforms current state-of-the-art methods in both few-shot and zero-shot scenarios.
arXiv Detail & Related papers (2020-09-30T08:56:53Z)
BURT: BERT-inspired Universal Representation from Twin Structure [89.82415322763475]
BURT (BERT inspired Universal Representation from Twin Structure) is capable of generating universal, fixed-size representations for input sequences of any granularity. Our proposed BURT adopts the Siamese network, learning sentence-level representations from natural language inference dataset and word/phrase-level representations from paraphrasing dataset. We evaluate BURT across different granularities of text similarity tasks, including STS tasks, SemEval2013 Task 5(a) and some commonly used word similarity tasks.
arXiv Detail & Related papers (2020-04-29T04:01:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.