Parameter Differentiation based Multilingual Neural Machine Translation
- URL: http://arxiv.org/abs/2112.13619v1
- Date: Mon, 27 Dec 2021 11:41:52 GMT
- Title: Parameter Differentiation based Multilingual Neural Machine Translation
- Authors: Qian Wang and Jiajun Zhang
- Abstract summary: Multilingual neural machine translation (MNMT) aims to translate multiple languages with a single model.
It is still an open question which parameters should be shared and which ones need to be task-specific.
We propose a novel parameter differentiation based method that allows the model to determine which parameters should be language-specific.
- Score: 37.16691633466614
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multilingual neural machine translation (MNMT) aims to translate multiple
languages with a single model and has been proved successful thanks to
effective knowledge transfer among different languages with shared parameters.
However, it is still an open question which parameters should be shared and
which ones need to be task-specific. Currently, the common practice is to
heuristically design or search language-specific modules, which is difficult to
find the optimal configuration. In this paper, we propose a novel parameter
differentiation based method that allows the model to determine which
parameters should be language-specific during training. Inspired by cellular
differentiation, each shared parameter in our method can dynamically
differentiate into more specialized types. We further define the
differentiation criterion as inter-task gradient similarity. Therefore,
parameters with conflicting inter-task gradients are more likely to be
language-specific. Extensive experiments on multilingual datasets have
demonstrated that our method significantly outperforms various strong baselines
with different parameter sharing configurations. Further analyses reveal that
the parameter sharing configuration obtained by our method correlates well with
the linguistic proximities.
Related papers
- Linguistic Fingerprint in Transformer Models: How Language Variation Influences Parameter Selection in Irony Detection [1.5807079236265718]
We aim to investigate how different English variations impact transformer-based models for irony detection.
Our results reveal several similarities between optimalworks, which provide insights into the linguistic variations that share strong resemblances and those that exhibit greater dissimilarities.
This study highlights the inherent structural similarities between models trained on different variants of the same language and also the critical role of parameter values in capturing these nuances.
arXiv Detail & Related papers (2024-06-04T14:09:36Z) - Multilingual Few-Shot Learning via Language Model Retrieval [18.465566186549072]
Transformer-based language models have achieved remarkable success in few-shot in-context learning.
We conduct a study of retrieving semantically similar few-shot samples and using them as the context.
We evaluate the proposed method on five natural language understanding datasets related to intent detection, question classification, sentiment analysis, and topic classification.
arXiv Detail & Related papers (2023-06-19T14:27:21Z) - Beyond Contrastive Learning: A Variational Generative Model for
Multilingual Retrieval [109.62363167257664]
We propose a generative model for learning multilingual text embeddings.
Our model operates on parallel data in $N$ languages.
We evaluate this method on a suite of tasks including semantic similarity, bitext mining, and cross-lingual question retrieval.
arXiv Detail & Related papers (2022-12-21T02:41:40Z) - Learning Visual Representation from Modality-Shared Contrastive
Language-Image Pre-training [88.80694147730883]
We investigate a variety of Modality-Shared Contrastive Language-Image Pre-training (MS-CLIP) frameworks.
In studied conditions, we observe that a mostly unified encoder for vision and language signals outperforms all other variations that separate more parameters.
Our approach outperforms vanilla CLIP by 1.6 points in linear probing on a collection of 24 downstream vision tasks.
arXiv Detail & Related papers (2022-07-26T05:19:16Z) - Towards a Unified View of Parameter-Efficient Transfer Learning [108.94786930869473]
Fine-tuning large pre-trained language models on downstream tasks has become the de-facto learning paradigm in NLP.
Recent work has proposed a variety of parameter-efficient transfer learning methods that only fine-tune a small number of (extra) parameters to attain strong performance.
We break down the design of state-of-the-art parameter-efficient transfer learning methods and present a unified framework that establishes connections between them.
arXiv Detail & Related papers (2021-10-08T20:22:26Z) - Are Multilingual Models Effective in Code-Switching? [57.78477547424949]
We study the effectiveness of multilingual language models to understand their capability and adaptability to the mixed-language setting.
Our findings suggest that pre-trained multilingual models do not necessarily guarantee high-quality representations on code-switching.
arXiv Detail & Related papers (2021-03-24T16:20:02Z) - Hierarchical Transformer for Multilingual Machine Translation [3.441021278275805]
The choice of parameter sharing strategy in multilingual machine translation models determines how optimally parameter space is used.
Inspired by linguistic trees that show the degree of relatedness between different languages, the new general approach to parameter sharing in multilingual machine translation was suggested recently.
We demonstrate that in case of carefully chosen training strategy the hierarchical architecture can outperform bilingual models and multilingual models with full parameter sharing.
arXiv Detail & Related papers (2021-03-05T10:51:47Z) - Gradient Vaccine: Investigating and Improving Multi-task Optimization in
Massively Multilingual Models [63.92643612630657]
This paper attempts to peek into the black-box of multilingual optimization through the lens of loss function geometry.
We find that gradient similarity measured along the optimization trajectory is an important signal, which correlates well with language proximity.
We derive a simple and scalable optimization procedure, named Gradient Vaccine, which encourages more geometrically aligned parameter updates for close tasks.
arXiv Detail & Related papers (2020-10-12T17:26:34Z) - UDapter: Language Adaptation for Truly Universal Dependency Parsing [6.346772579930929]
Cross-language interference and restrained model capacity remain major obstacles to universal multilingual dependency parsing.
We propose a novel multilingual task adaptation approach based on contextual parameter generation and adapter modules.
The resulting UDapter outperforms strong monolingual and multilingual baselines on the majority of both high-resource and low-resource (zero-shot) languages.
arXiv Detail & Related papers (2020-04-29T16:52:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.