A Study of Syntactic Multi-Modality in Non-Autoregressive Machine
Translation
- URL: http://arxiv.org/abs/2207.04206v1
- Date: Sat, 9 Jul 2022 06:48:10 GMT
- Title: A Study of Syntactic Multi-Modality in Non-Autoregressive Machine
Translation
- Authors: Kexun Zhang, Rui Wang, Xu Tan, Junliang Guo, Yi Ren, Tao Qin, Tie-Yan
Liu
- Abstract summary: It is difficult for non-autoregressive translation models to capture the multi-modal distribution of target translations.
We decompose it into short- and long-range syntactic multi-modalities and evaluate several recent NAT algorithms with advanced loss functions.
We design a new loss function to better handle the complicated syntactic multi-modality in real-world datasets.
- Score: 144.55713938260828
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: It is difficult for non-autoregressive translation (NAT) models to capture
the multi-modal distribution of target translations due to their conditional
independence assumption, which is known as the "multi-modality problem",
including the lexical multi-modality and the syntactic multi-modality. While
the first one has been well studied, the syntactic multi-modality brings severe
challenge to the standard cross entropy (XE) loss in NAT and is under studied.
In this paper, we conduct a systematic study on the syntactic multi-modality
problem. Specifically, we decompose it into short- and long-range syntactic
multi-modalities and evaluate several recent NAT algorithms with advanced loss
functions on both carefully designed synthesized datasets and real datasets. We
find that the Connectionist Temporal Classification (CTC) loss and the
Order-Agnostic Cross Entropy (OAXE) loss can better handle short- and
long-range syntactic multi-modalities respectively. Furthermore, we take the
best of both and design a new loss function to better handle the complicated
syntactic multi-modality in real-world datasets. To facilitate practical usage,
we provide a guide to use different loss functions for different kinds of
syntactic multi-modality.
Related papers
- Enhancing Unimodal Latent Representations in Multimodal VAEs through Iterative Amortized Inference [20.761803725098005]
Multimodal variational autoencoders (VAEs) aim to capture shared latent representations by integrating information from different data modalities.
A significant challenge is accurately inferring representations from any subset of modalities without training an impractical number of inference networks for all possible modality combinations.
We introduce multimodal iterative amortized inference, an iterative refinement mechanism within the multimodal VAE framework.
arXiv Detail & Related papers (2024-10-15T08:49:38Z) - GSIFN: A Graph-Structured and Interlaced-Masked Multimodal Transformer-based Fusion Network for Multimodal Sentiment Analysis [0.0]
Multimodal Sentiment Analysis (MSA) leverages multiple data modals to analyze human sentiment.
Existing MSA models generally employ cutting-edge multimodal fusion and representation learning-based methods to promote MSA capability.
Our proposed GSIFN incorporates two main components to solve these problems: (i) a graph-structured and interlaced-masked multimodal Transformer.
It adopts the Interlaced Mask mechanism to construct robust multimodal graph embedding, achieve all-modal-in-one Transformer-based fusion, and greatly reduce the computational overhead.
arXiv Detail & Related papers (2024-08-27T06:44:28Z) - On the Information Redundancy in Non-Autoregressive Translation [82.43992805551498]
Token repetition is a typical form of multi-modal problem in non-autoregressive translation (NAT)
In this work, we revisit the multi-modal problem in recently proposed NAT models.
We identify two types of information redundancy errors that correspond well to lexical and reordering multi-modality problems.
arXiv Detail & Related papers (2024-05-04T14:20:28Z) - TMT: Tri-Modal Translation between Speech, Image, and Text by Processing
Different Modalities as Different Languages [96.8603701943286]
Tri-Modal Translation (TMT) model translates between arbitrary modalities spanning speech, image, and text.
We tokenize speech and image data into discrete tokens, which provide a unified interface across modalities.
TMT outperforms single model counterparts consistently.
arXiv Detail & Related papers (2024-02-25T07:46:57Z) - Unified Multi-modal Unsupervised Representation Learning for
Skeleton-based Action Understanding [62.70450216120704]
Unsupervised pre-training has shown great success in skeleton-based action understanding.
We propose a Unified Multimodal Unsupervised Representation Learning framework, called UmURL.
UmURL exploits an efficient early-fusion strategy to jointly encode the multi-modal features in a single-stream manner.
arXiv Detail & Related papers (2023-11-06T13:56:57Z) - Deep Metric Loss for Multimodal Learning [3.8979646385036175]
We introduce a novel textMultiModal loss paradigm for multimodal learning.
textMultiModal loss can prevent inefficient learning caused by overfitting and efficiently optimize multimodal models.
Our loss is empirically shown to improve the performance of recent models.
arXiv Detail & Related papers (2023-08-21T06:04:30Z) - Efficient Multimodal Transformer with Dual-Level Feature Restoration for
Robust Multimodal Sentiment Analysis [47.29528724322795]
Multimodal Sentiment Analysis (MSA) has attracted increasing attention recently.
Despite significant progress, there are still two major challenges on the way towards robust MSA.
We propose a generic and unified framework to address them, named Efficient Multimodal Transformer with Dual-Level Feature Restoration (EMT-DLFR)
arXiv Detail & Related papers (2022-08-16T08:02:30Z) - Multi-scale Cooperative Multimodal Transformers for Multimodal Sentiment
Analysis in Videos [58.93586436289648]
We propose a multi-scale cooperative multimodal transformer (MCMulT) architecture for multimodal sentiment analysis.
Our model outperforms existing approaches on unaligned multimodal sequences and has strong performance on aligned multimodal sequences.
arXiv Detail & Related papers (2022-06-16T07:47:57Z) - Improving Multimodal fusion via Mutual Dependency Maximisation [5.73995120847626]
Multimodal sentiment analysis is a trending area of research, and the multimodal fusion is one of its most active topic.
In this work, we investigate unexplored penalties and propose a set of new objectives that measure the dependency between modalities.
We demonstrate that our new penalties lead to a consistent improvement (up to $4.3$ on accuracy) across a large variety of state-of-the-art models.
arXiv Detail & Related papers (2021-08-31T06:26:26Z) - Bi-Bimodal Modality Fusion for Correlation-Controlled Multimodal
Sentiment Analysis [96.46952672172021]
Bi-Bimodal Fusion Network (BBFN) is a novel end-to-end network that performs fusion on pairwise modality representations.
Model takes two bimodal pairs as input due to known information imbalance among modalities.
arXiv Detail & Related papers (2021-07-28T23:33:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.