Large Scale Multi-Lingual Multi-Modal Summarization Dataset
- URL: http://arxiv.org/abs/2302.06560v1
- Date: Mon, 13 Feb 2023 18:00:23 GMT
- Title: Large Scale Multi-Lingual Multi-Modal Summarization Dataset
- Authors: Yash Verma, Anubhav Jangra, Raghvendra Kumar, Sriparna Saha
- Abstract summary: We present the current largest multi-lingual multi-modal summarization dataset (M3LS)
It consists of over a million instances of document-image pairs along with a professionally annotated multi-modal summary for each pair.
It is also the largest summarization dataset for 13 languages and consists of cross-lingual summarization data for 2 languages.
- Score: 26.92121230628835
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Significant developments in techniques such as encoder-decoder models have
enabled us to represent information comprising multiple modalities. This
information can further enhance many downstream tasks in the field of
information retrieval and natural language processing; however, improvements in
multi-modal techniques and their performance evaluation require large-scale
multi-modal data which offers sufficient diversity. Multi-lingual modeling for
a variety of tasks like multi-modal summarization, text generation, and
translation leverages information derived from high-quality multi-lingual
annotated data. In this work, we present the current largest multi-lingual
multi-modal summarization dataset (M3LS), and it consists of over a million
instances of document-image pairs along with a professionally annotated
multi-modal summary for each pair. It is derived from news articles published
by British Broadcasting Corporation(BBC) over a decade and spans 20 languages,
targeting diversity across five language roots, it is also the largest
summarization dataset for 13 languages and consists of cross-lingual
summarization data for 2 languages. We formally define the multi-lingual
multi-modal summarization task utilizing our dataset and report baseline scores
from various state-of-the-art summarization techniques in a multi-lingual
setting. We also compare it with many similar datasets to analyze the
uniqueness and difficulty of M3LS.
Related papers
- MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular Annotations for Medicine [53.01393667775077]
This paper introduces MedTrinity-25M, a comprehensive, large-scale multimodal dataset for medicine.
It covers over 25 million images across 10 modalities, with multigranular annotations for more than 65 diseases.
Unlike existing approach which is limited by the availability of image-text pairs, we have developed the first automated pipeline.
arXiv Detail & Related papers (2024-08-06T02:09:35Z) - X-LLaVA: Optimizing Bilingual Large Vision-Language Alignment [4.571088742209442]
We create a 91K English-Korean-Chinese multilingual, multimodal training dataset.
We develop a bilingual multimodal model that exhibits excellent performance in both Korean and English.
arXiv Detail & Related papers (2024-03-18T01:14:47Z) - Multimodal Large Language Models: A Survey [36.06016060015404]
Multimodal language models integrate multiple data types, such as images, text, language, audio, and other heterogeneity.
This paper begins by defining the concept of multimodal and examining the historical development of multimodal algorithms.
A practical guide is provided, offering insights into the technical aspects of multimodal models.
Lastly, we explore the applications of multimodal models and discuss the challenges associated with their development.
arXiv Detail & Related papers (2023-11-22T05:15:12Z) - Multi3WOZ: A Multilingual, Multi-Domain, Multi-Parallel Dataset for
Training and Evaluating Culturally Adapted Task-Oriented Dialog Systems [64.40789703661987]
Multi3WOZ is a novel multilingual, multi-domain, multi-parallel ToD dataset.
It is large-scale and offers culturally adapted dialogs in 4 languages.
We describe a complex bottom-up data collection process that yielded the final dataset.
arXiv Detail & Related papers (2023-07-26T08:29:42Z) - MULTI3NLU++: A Multilingual, Multi-Intent, Multi-Domain Dataset for
Natural Language Understanding in Task-Oriented Dialogue [115.32009638844059]
We extend the English only NLU++ dataset to include manual translations into a range of high, medium, and low resource languages.
Because of its multi-intent property, MULTI3NLU++ represents complex and natural user goals.
We use MULTI3NLU++ to benchmark state-of-the-art multilingual models for the Natural Language Understanding tasks of intent detection and slot labelling.
arXiv Detail & Related papers (2022-12-20T17:34:25Z) - Multilingual Multimodal Learning with Machine Translated Text [27.7207234512674]
We investigate whether machine translating English multimodal data can be an effective proxy for the lack of readily available multilingual data.
We propose two metrics for automatically removing such translations from the resulting datasets.
In experiments on five tasks across 20 languages in the IGLUE benchmark, we show that translated data can provide a useful signal for multilingual multimodal learning.
arXiv Detail & Related papers (2022-10-24T11:41:20Z) - Unsupervised Multimodal Language Representations using Convolutional
Autoencoders [5.464072883537924]
We propose extracting unsupervised Multimodal Language representations that are universal and can be applied to different tasks.
We map the word-level aligned multimodal sequences to 2-D matrices and then use Convolutional Autoencoders to learn embeddings by combining multiple datasets.
It is also shown that our method is extremely lightweight and can be easily generalized to other tasks and unseen data with small performance drop and almost the same number of parameters.
arXiv Detail & Related papers (2021-10-06T18:28:07Z) - xGQA: Cross-Lingual Visual Question Answering [100.35229218735938]
xGQA is a new multilingual evaluation benchmark for the visual question answering task.
We extend the established English GQA dataset to 7 typologically diverse languages.
We propose new adapter-based approaches to adapt multimodal transformer-based models to become multilingual.
arXiv Detail & Related papers (2021-09-13T15:58:21Z) - CoSDA-ML: Multi-Lingual Code-Switching Data Augmentation for Zero-Shot
Cross-Lingual NLP [68.2650714613869]
We propose a data augmentation framework to generate multi-lingual code-switching data to fine-tune mBERT.
Compared with the existing work, our method does not rely on bilingual sentences for training, and requires only one training process for multiple target languages.
arXiv Detail & Related papers (2020-06-11T13:15:59Z) - M3P: Learning Universal Representations via Multitask Multilingual
Multimodal Pre-training [119.16007395162431]
M3P is a Multilingual Multimodal Pre-trained model that combines multilingual pre-training and multimodal pre-training.
We show that M3P can achieve comparable results for English and new state-of-the-art results for non-English languages.
arXiv Detail & Related papers (2020-06-04T03:54:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.