MTVR: Multilingual Moment Retrieval in Videos
- URL: http://arxiv.org/abs/2108.00061v1
- Date: Fri, 30 Jul 2021 20:01:03 GMT
- Title: MTVR: Multilingual Moment Retrieval in Videos
- Authors: Jie Lei, Tamara L. Berg, Mohit Bansal
- Abstract summary: We introduce mTVR, a large-scale multilingual video moment retrieval dataset, containing 218K English and Chinese queries from 21.8K TV show video clips.
The dataset is collected by extending the popular TVR dataset (in English) with paired Chinese queries and subtitles.
We propose mXML, a multilingual moment retrieval model that learns and operates on data from both languages.
- Score: 89.24431389933703
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce mTVR, a large-scale multilingual video moment retrieval dataset,
containing 218K English and Chinese queries from 21.8K TV show video clips. The
dataset is collected by extending the popular TVR dataset (in English) with
paired Chinese queries and subtitles. Compared to existing moment retrieval
datasets, mTVR is multilingual, larger, and comes with diverse annotations. We
further propose mXML, a multilingual moment retrieval model that learns and
operates on data from both languages, via encoder parameter sharing and
language neighborhood constraints. We demonstrate the effectiveness of mXML on
the newly collected MTVR dataset, where mXML outperforms strong monolingual
baselines while using fewer parameters. In addition, we also provide detailed
dataset analyses and model ablations. Data and code are publicly available at
https://github.com/jayleicn/mTVRetrieval
Related papers
- Leveraging LLMs for Synthesizing Training Data Across Many Languages in Multilingual Dense Retrieval [56.65147231836708]
We develop SWIM-IR, a synthetic retrieval training dataset containing 33 languages for fine-tuning multilingual dense retrievers.
SAP assists the large language model (LLM) in generating informative queries in the target language.
Our models, called SWIM-X, are competitive with human-supervised dense retrieval models.
arXiv Detail & Related papers (2023-11-10T00:17:10Z) - The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants [80.4837840962273]
We present Belebele, a dataset spanning 122 language variants.
This dataset enables the evaluation of text models in high-, medium-, and low-resource languages.
arXiv Detail & Related papers (2023-08-31T17:43:08Z) - XTREME-UP: A User-Centric Scarce-Data Benchmark for Under-Represented
Languages [105.54207724678767]
Data scarcity is a crucial issue for the development of highly multilingual NLP systems.
We propose XTREME-UP, a benchmark defined by its focus on the scarce-data scenario rather than zero-shot.
XTREME-UP evaluates the capabilities of language models across 88 under-represented languages over 9 key user-centric technologies.
arXiv Detail & Related papers (2023-05-19T18:00:03Z) - MuMUR : Multilingual Multimodal Universal Retrieval [19.242056928318913]
We propose a framework MuMUR, that utilizes knowledge transfer from a multilingual model to boost the performance of multi-modal (image and video) retrieval.
We first use state-of-the-art machine translation models to construct pseudo ground-truth multilingual visual-text pairs.
We then use this data to learn a joint vision-text representation where English and non-English text queries are represented in a common embedding space.
arXiv Detail & Related papers (2022-08-24T13:55:15Z) - Multilingual Coreference Resolution in Multiparty Dialogue [29.92954906275944]
We create a large-scale dataset, Multilingual Multiparty Coref, for this task based on TV transcripts.
Due to the availability of gold-quality subtitles in multiple languages, we propose reusing the annotations to create silver coreference resolution data in other languages.
We find success both using it for data augmentation and training from scratch, which effectively simulates the zero-shot cross-lingual setting.
arXiv Detail & Related papers (2022-08-02T08:27:00Z) - MFAQ: a Multilingual FAQ Dataset [9.625301186732598]
We present the first multilingual FAQ dataset publicly available.
We collected around 6M FAQ pairs from the web, in 21 different languages.
We adopt a similar setup as Dense Passage Retrieval (DPR) and test various bi-encoders on this dataset.
arXiv Detail & Related papers (2021-09-27T08:43:25Z) - Multilingual Multimodal Pre-training for Zero-Shot Cross-Lingual
Transfer of Vision-Language Models [144.85290716246533]
We study zero-shot cross-lingual transfer of vision-language models.
We propose a Transformer-based model that learns contextualized multilingual multimodal embeddings.
arXiv Detail & Related papers (2021-03-16T04:37:40Z) - TVR: A Large-Scale Dataset for Video-Subtitle Moment Retrieval [111.93601253692165]
TV show Retrieval (TVR) is a new multimodal retrieval dataset.
TVR requires systems to understand both videos and their associated subtitle (dialogue) texts.
The dataset contains 109K queries collected on 21.8K videos from 6 TV shows of diverse genres.
arXiv Detail & Related papers (2020-01-24T17:09:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.