Related papers: MERLIN: A Testbed for Multilingual Multimodal Entity Recognition and Linking

MERLIN: A Testbed for Multilingual Multimodal Entity Recognition and Linking

URL: http://arxiv.org/abs/2510.14307v1
Date: Thu, 16 Oct 2025 05:06:54 GMT
Title: MERLIN: A Testbed for Multilingual Multimodal Entity Recognition and Linking
Authors: Sathyanarayanan Ramamoorthy, Vishwa Shah, Simran Khanuja, Zaid Sheikh, Shan Jie, Ann Chia, Shearman Chua, Graham Neubig,
Abstract summary: This paper introduces MERLIN, a novel testbed system for the task of Multilingual Multimodal Entity Linking.<n>The created dataset includes BBC news article titles, paired with corresponding images, in five languages: Hindi, Japanese, Indonesian, Vietnamese, and Tamil.<n>We also include several benchmarks using multilingual and multimodal entity linking methods exploring different language models.
Score: 40.44700814984852
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: This paper introduces MERLIN, a novel testbed system for the task of Multilingual Multimodal Entity Linking. The created dataset includes BBC news article titles, paired with corresponding images, in five languages: Hindi, Japanese, Indonesian, Vietnamese, and Tamil, featuring over 7,000 named entity mentions linked to 2,500 unique Wikidata entities. We also include several benchmarks using multilingual and multimodal entity linking methods exploring different language models like LLaMa-2 and Aya-23. Our findings indicate that incorporating visual data improves the accuracy of entity linking, especially for entities where the textual context is ambiguous or insufficient, and particularly for models that do not have strong multilingual abilities. For the work, the dataset, methods are available here at https://github.com/rsathya4802/merlin

Related papers

QQ: A Toolkit for Language Identifiers and Metadata [7.607054209125189]
We introduce QwanQwa, a light-weight Python toolkit for unified language metadata management.<n>QwanQwa integrates multiple language resources into a single interface.<n>It provides convenient normalization and mapping between language identifiers.
arXiv Detail & Related papers (2026-02-28T12:29:45Z)
Parrot: Multilingual Visual Instruction Tuning [66.65963606552839]
Existing methods typically align vision encoders with Multimodal Large Language Models (MLLMs) via supervised fine-tuning (SFT)<n>We propose PARROT, a novel approach that leverages textual guidance for visual token alignment at the language level.<n>We introduce the Massive Multilingual Multimodal Benchmark (MMMB), a new benchmark comprising 6 languages, 15 categories, and 12,000 questions.
arXiv Detail & Related papers (2024-06-04T17:56:28Z)
The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants [80.4837840962273]
We present Belebele, a dataset spanning 122 language variants. This dataset enables the evaluation of text models in high-, medium-, and low-resource languages.
arXiv Detail & Related papers (2023-08-31T17:43:08Z)
PolyLM: An Open Source Polyglot Large Language Model [57.64420154135178]
We present PolyLM, a multilingual large language model (LLMs) trained on 640 billion (B) tokens, avaliable in two model sizes: 1.7B and 13B. To enhance its multilingual capabilities, we 1) integrate bilingual data into training data; and 2) adopt a curriculum learning strategy that increases the proportion of non-English data from 30% in the first stage to 60% in the final stage during pre-training. Further, we propose a multilingual self-instruct method which automatically generates 132.7K diverse multilingual instructions for model fine-tuning.
arXiv Detail & Related papers (2023-07-12T09:00:37Z)
Multilingual End to End Entity Linking [6.478594417785639]
We release and open source BELA, the first fully end-to-end multilingual entity linking model. BelA efficiently detects and links entities in texts in any of 97 languages.
arXiv Detail & Related papers (2023-06-15T07:02:00Z)
MUTANT: A Multi-sentential Code-mixed Hinglish Dataset [16.14337612590717]
We propose a novel task of identifying multi-sentential code-mixed text (MCT) from multilingual articles. As a use case, we leverage multilingual articles and build a first-of-its-kind multi-sentential code-mixed Hinglish dataset. The MUTANT dataset comprises 67k articles with 85k identified Hinglish MCTs.
arXiv Detail & Related papers (2023-02-23T04:04:18Z)
Multilingual Multimodal Learning with Machine Translated Text [27.7207234512674]
We investigate whether machine translating English multimodal data can be an effective proxy for the lack of readily available multilingual data. We propose two metrics for automatically removing such translations from the resulting datasets. In experiments on five tasks across 20 languages in the IGLUE benchmark, we show that translated data can provide a useful signal for multilingual multimodal learning.
arXiv Detail & Related papers (2022-10-24T11:41:20Z)
A Multilingual Bag-of-Entities Model for Zero-Shot Cross-Lingual Text Classification [16.684856745734944]
We present a multilingual bag-of-entities model that boosts the performance of zero-shot cross-lingual text classification. It leverages the multilingual nature of Wikidata: entities in multiple languages representing the same concept are defined with a unique identifier. A model trained on entity features in a resource-rich language can thus be directly applied to other languages.
arXiv Detail & Related papers (2021-10-15T01:10:50Z)
Cross-lingual Intermediate Fine-tuning improves Dialogue State Tracking [84.50302759362698]
We enhance the transfer learning process by intermediate fine-tuning of pretrained multilingual models. We use parallel and conversational movie subtitles datasets to design cross-lingual intermediate tasks. We achieve impressive improvements (> 20% on goal accuracy) on the parallel MultiWoZ dataset and Multilingual WoZ dataset.
arXiv Detail & Related papers (2021-09-28T11:22:38Z)
CoSDA-ML: Multi-Lingual Code-Switching Data Augmentation for Zero-Shot Cross-Lingual NLP [68.2650714613869]
We propose a data augmentation framework to generate multi-lingual code-switching data to fine-tune mBERT. Compared with the existing work, our method does not rely on bilingual sentences for training, and requires only one training process for multiple target languages.
arXiv Detail & Related papers (2020-06-11T13:15:59Z)
MLSUM: The Multilingual Summarization Corpus [29.943949944682196]
MLSUM is the first large-scale MultiLingual SUMmarization dataset. It contains 1.5M+ article/summary pairs in five different languages.
arXiv Detail & Related papers (2020-04-30T15:58:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.