MSCTD: A Multimodal Sentiment Chat Translation Dataset
- URL: http://arxiv.org/abs/2202.13645v1
- Date: Mon, 28 Feb 2022 09:40:46 GMT
- Title: MSCTD: A Multimodal Sentiment Chat Translation Dataset
- Authors: Yunlong Liang, Fandong Meng, Jinan Xu, Yufeng Chen and Jie Zhou
- Abstract summary: We introduce a new task named Multimodal Chat Translation (MCT)
MCT aims to generate more accurate translations with the help of the associated dialogue history and visual context.
Our work can facilitate research on both multimodal chat translation and multimodal dialogue sentiment analysis.
- Score: 66.81525961469494
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multimodal machine translation and textual chat translation have received
considerable attention in recent years. Although the conversation in its
natural form is usually multimodal, there still lacks work on multimodal
machine translation in conversations. In this work, we introduce a new task
named Multimodal Chat Translation (MCT), aiming to generate more accurate
translations with the help of the associated dialogue history and visual
context. To this end, we firstly construct a Multimodal Sentiment Chat
Translation Dataset (MSCTD) containing 142,871 English-Chinese utterance pairs
in 14,762 bilingual dialogues and 30,370 English-German utterance pairs in
3,079 bilingual dialogues. Each utterance pair, corresponding to the visual
context that reflects the current conversational scene, is annotated with a
sentiment label. Then, we benchmark the task by establishing multiple baseline
systems that incorporate multimodal and sentiment features for MCT. Preliminary
experiments on four language directions (English-Chinese and English-German)
verify the potential of contextual and multimodal information fusion and the
positive impact of sentiment on the MCT task. Additionally, as a by-product of
the MSCTD, it also provides two new benchmarks on multimodal dialogue sentiment
analysis. Our work can facilitate research on both multimodal chat translation
and multimodal dialogue sentiment analysis.
Related papers
- TMT: Tri-Modal Translation between Speech, Image, and Text by Processing
Different Modalities as Different Languages [96.8603701943286]
Tri-Modal Translation (TMT) model translates between arbitrary modalities spanning speech, image, and text.
We tokenize speech and image data into discrete tokens, which provide a unified interface across modalities.
TMT outperforms single model counterparts consistently.
arXiv Detail & Related papers (2024-02-25T07:46:57Z) - Which One Are You Referring To? Multimodal Object Identification in
Situated Dialogue [50.279206765971125]
We explore three methods to tackle the problem of interpreting multimodal inputs from conversational and situational contexts.
Our best method, scene-dialogue alignment, improves the performance by 20% F1-score compared to the SIMMC 2.1 baselines.
arXiv Detail & Related papers (2023-02-28T15:45:20Z) - LVP-M3: Language-aware Visual Prompt for Multilingual Multimodal Machine
Translation [94.33019040320507]
Multimodal Machine Translation (MMT) focuses on enhancing text-only translation with visual features.
Recent advances still struggle to train a separate model for each language pair, which is costly and unaffordable when the number of languages increases.
We propose the Multilingual MMT task by establishing two new Multilingual MMT benchmark datasets covering seven languages.
arXiv Detail & Related papers (2022-10-19T12:21:39Z) - Multi2WOZ: A Robust Multilingual Dataset and Conversational Pretraining
for Task-Oriented Dialog [67.20796950016735]
Multi2WOZ dataset spans four typologically diverse languages: Chinese, German, Arabic, and Russian.
We introduce a new framework for multilingual conversational specialization of pretrained language models (PrLMs) that aims to facilitate cross-lingual transfer for arbitrary downstream TOD tasks.
Our experiments show that, in most setups, the best performance entails the combination of (I) conversational specialization in the target language and (ii) few-shot transfer for the concrete TOD task.
arXiv Detail & Related papers (2022-05-20T18:35:38Z) - M3ED: Multi-modal Multi-scene Multi-label Emotional Dialogue Database [139.08528216461502]
We propose a Multi-modal Multi-scene Multi-label Emotional Dialogue dataset, M3ED.
M3ED contains 990 dyadic emotional dialogues from 56 different TV series, a total of 9,082 turns and 24,449 utterances.
To the best of our knowledge, M3ED is the first multimodal emotional dialogue dataset in Chinese.
arXiv Detail & Related papers (2022-05-09T06:52:51Z) - Modeling Bilingual Conversational Characteristics for Neural Chat
Translation [24.94474722693084]
We aim to promote the translation quality of conversational text by modeling the above properties.
We evaluate our approach on the benchmark dataset BConTrasT (English-German) and a self-collected bilingual dialogue corpus, named BMELD (English-Chinese)
Our approach notably boosts the performance over strong baselines by a large margin and significantly surpasses some state-of-the-art context-aware NMT models in terms of BLEU and TER.
arXiv Detail & Related papers (2021-07-23T12:23:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.