Enhancing Context Through Contrast
- URL: http://arxiv.org/abs/2401.03314v1
- Date: Sat, 6 Jan 2024 22:13:51 GMT
- Title: Enhancing Context Through Contrast
- Authors: Kshitij Ambilduke, Aneesh Shetye, Diksha Bagade, Rishika Bhagwatkar,
Khurshed Fitter, Prasad Vagdargi, Shital Chiddarwar
- Abstract summary: We propose a novel Context Enhancement step to improve performance on neural machine translation.
Unlike other approaches, we do not explicitly augment the data but view languages as implicit augmentations.
Our method does not learn embeddings from scratch and can be generalised to any set of pre-trained embeddings.
- Score: 0.4068270792140993
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Neural machine translation benefits from semantically rich representations.
Considerable progress in learning such representations has been achieved by
language modelling and mutual information maximization objectives using
contrastive learning. The language-dependent nature of language modelling
introduces a trade-off between the universality of the learned representations
and the model's performance on the language modelling tasks. Although
contrastive learning improves performance, its success cannot be attributed to
mutual information alone. We propose a novel Context Enhancement step to
improve performance on neural machine translation by maximizing mutual
information using the Barlow Twins loss. Unlike other approaches, we do not
explicitly augment the data but view languages as implicit augmentations,
eradicating the risk of disrupting semantic information. Further, our method
does not learn embeddings from scratch and can be generalised to any set of
pre-trained embeddings. Finally, we evaluate the language-agnosticism of our
embeddings through language classification and use them for neural machine
translation to compare with state-of-the-art approaches.
Related papers
- Collapsed Language Models Promote Fairness [88.48232731113306]
We find that debiased language models exhibit collapsed alignment between token representations and word embeddings.
We design a principled fine-tuning method that can effectively improve fairness in a wide range of debiasing methods.
arXiv Detail & Related papers (2024-10-06T13:09:48Z) - Evaluating and explaining training strategies for zero-shot cross-lingual news sentiment analysis [8.770572911942635]
We introduce novel evaluation datasets in several less-resourced languages.
We experiment with a range of approaches including the use of machine translation.
We show that language similarity is not in itself sufficient for predicting the success of cross-lingual transfer.
arXiv Detail & Related papers (2024-09-30T07:59:41Z) - Visual Grounding Helps Learn Word Meanings in Low-Data Regimes [47.7950860342515]
Modern neural language models (LMs) are powerful tools for modeling human sentence production and comprehension.
But to achieve these results, LMs must be trained in distinctly un-human-like ways.
Do models trained more naturalistically -- with grounded supervision -- exhibit more humanlike language learning?
We investigate this question in the context of word learning, a key sub-task in language acquisition.
arXiv Detail & Related papers (2023-10-20T03:33:36Z) - Commonsense Knowledge Transfer for Pre-trained Language Models [83.01121484432801]
We introduce commonsense knowledge transfer, a framework to transfer the commonsense knowledge stored in a neural commonsense knowledge model to a general-purpose pre-trained language model.
It first exploits general texts to form queries for extracting commonsense knowledge from the neural commonsense knowledge model.
It then refines the language model with two self-supervised objectives: commonsense mask infilling and commonsense relation prediction.
arXiv Detail & Related papers (2023-06-04T15:44:51Z) - Distilling Linguistic Context for Language Model Compression [27.538080564616703]
A computationally expensive and memory intensive neural network lies behind the recent success of language representation learning.
We present a new knowledge distillation objective for language representation learning that transfers the contextual knowledge via two types of relationships.
We validate the effectiveness of our method on challenging benchmarks of language understanding tasks.
arXiv Detail & Related papers (2021-09-17T05:51:45Z) - Improving Cross-Lingual Reading Comprehension with Self-Training [62.73937175625953]
Current state-of-the-art models even surpass human performance on several benchmarks.
Previous works have revealed the abilities of pre-trained multilingual models for zero-shot cross-lingual reading comprehension.
This paper further utilized unlabeled data to improve the performance.
arXiv Detail & Related papers (2021-05-08T08:04:30Z) - InfoXLM: An Information-Theoretic Framework for Cross-Lingual Language
Model Pre-Training [135.12061144759517]
We present an information-theoretic framework that formulates cross-lingual language model pre-training.
We propose a new pre-training task based on contrastive learning.
By leveraging both monolingual and parallel corpora, we jointly train the pretext to improve the cross-lingual transferability of pre-trained models.
arXiv Detail & Related papers (2020-07-15T16:58:01Z) - Data Augmentation for Spoken Language Understanding via Pretrained
Language Models [113.56329266325902]
Training of spoken language understanding (SLU) models often faces the problem of data scarcity.
We put forward a data augmentation method using pretrained language models to boost the variability and accuracy of generated utterances.
arXiv Detail & Related papers (2020-04-29T04:07:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.