HIT: A Hierarchically Fused Deep Attention Network for Robust Code-mixed
Language Representation
- URL: http://arxiv.org/abs/2105.14600v1
- Date: Sun, 30 May 2021 18:53:33 GMT
- Title: HIT: A Hierarchically Fused Deep Attention Network for Robust Code-mixed
Language Representation
- Authors: Ayan Sengupta, Sourabh Kumar Bhattacharjee, Tanmoy Chakraborty, Md
Shad Akhtar
- Abstract summary: We propose HIT, a robust representation learning method for code-mixed texts.
HIT is a hierarchical transformer-based framework that captures the semantic relationship among words.
Our evaluation of HIT on one European (Spanish) and five Indic (Hindi, Bengali, Tamil, Telugu, and Malayalam) languages suggests significant performance improvement against various state-of-the-art systems.
- Score: 18.136640008855117
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Understanding linguistics and morphology of resource-scarce code-mixed texts
remains a key challenge in text processing. Although word embedding comes in
handy to support downstream tasks for low-resource languages, there are plenty
of scopes in improving the quality of language representation particularly for
code-mixed languages. In this paper, we propose HIT, a robust representation
learning method for code-mixed texts. HIT is a hierarchical transformer-based
framework that captures the semantic relationship among words and
hierarchically learns the sentence-level semantics using a fused attention
mechanism. HIT incorporates two attention modules, a multi-headed
self-attention and an outer product attention module, and computes their
weighted sum to obtain the attention weights. Our evaluation of HIT on one
European (Spanish) and five Indic (Hindi, Bengali, Tamil, Telugu, and
Malayalam) languages across four NLP tasks on eleven datasets suggests
significant performance improvement against various state-of-the-art systems.
We further show the adaptability of learned representation across tasks in a
transfer learning setup (with and without fine-tuning).
Related papers
- Improving Multilingual Neural Machine Translation by Utilizing Semantic and Linguistic Features [18.76505158652759]
We propose to exploit both semantic and linguistic features between multiple languages to enhance multilingual translation.
On the encoder side, we introduce a disentangling learning task that aligns encoder representations by disentangling semantic and linguistic features.
On the decoder side, we leverage a linguistic encoder to integrate low-level linguistic features to assist in the target language generation.
arXiv Detail & Related papers (2024-08-02T17:10:12Z) - Language Agnostic Code Embeddings [61.84835551549612]
We focus on the cross-lingual capabilities of code embeddings across different programming languages.
Code embeddings comprise two distinct components: one deeply tied to the nuances and syntax of a specific language, and the other remaining agnostic to these details.
We show that when we isolate and eliminate this language-specific component, we witness significant improvements in downstream code retrieval tasks.
arXiv Detail & Related papers (2023-10-25T17:34:52Z) - Can Linguistic Knowledge Improve Multimodal Alignment in Vision-Language
Pretraining? [34.609984453754656]
We aim to elucidate the impact of comprehensive linguistic knowledge, including semantic expression and syntactic structure, on multimodal alignment.
Specifically, we design and release the SNARE, the first large-scale multimodal alignment probing benchmark.
arXiv Detail & Related papers (2023-08-24T16:17:40Z) - mCL-NER: Cross-Lingual Named Entity Recognition via Multi-view
Contrastive Learning [54.523172171533645]
Cross-lingual named entity recognition (CrossNER) faces challenges stemming from uneven performance due to the scarcity of multilingual corpora.
We propose Multi-view Contrastive Learning for Cross-lingual Named Entity Recognition (mCL-NER)
Our experiments on the XTREME benchmark, spanning 40 languages, demonstrate the superiority of mCL-NER over prior data-driven and model-based approaches.
arXiv Detail & Related papers (2023-08-17T16:02:29Z) - Multi-level Contrastive Learning for Cross-lingual Spoken Language
Understanding [90.87454350016121]
We develop novel code-switching schemes to generate hard negative examples for contrastive learning at all levels.
We develop a label-aware joint model to leverage label semantics for cross-lingual knowledge transfer.
arXiv Detail & Related papers (2022-05-07T13:44:28Z) - A Comprehensive Understanding of Code-mixed Language Semantics using
Hierarchical Transformer [28.3684494647968]
We propose a hierarchical transformer-based architecture (HIT) to learn the semantics of code-mixed languages.
We evaluate the proposed method across 6 Indian languages and 9 NLP tasks on 17 datasets.
arXiv Detail & Related papers (2022-04-27T07:50:18Z) - Reinforced Iterative Knowledge Distillation for Cross-Lingual Named
Entity Recognition [54.92161571089808]
Cross-lingual NER transfers knowledge from rich-resource language to languages with low resources.
Existing cross-lingual NER methods do not make good use of rich unlabeled data in target languages.
We develop a novel approach based on the ideas of semi-supervised learning and reinforcement learning.
arXiv Detail & Related papers (2021-06-01T05:46:22Z) - VECO: Variable and Flexible Cross-lingual Pre-training for Language
Understanding and Generation [77.82373082024934]
We plug a cross-attention module into the Transformer encoder to explicitly build the interdependence between languages.
It can effectively avoid the degeneration of predicting masked words only conditioned on the context in its own language.
The proposed cross-lingual model delivers new state-of-the-art results on various cross-lingual understanding tasks of the XTREME benchmark.
arXiv Detail & Related papers (2020-10-30T03:41:38Z) - FILTER: An Enhanced Fusion Method for Cross-lingual Language
Understanding [85.29270319872597]
We propose an enhanced fusion method that takes cross-lingual data as input for XLM finetuning.
During inference, the model makes predictions based on the text input in the target language and its translation in the source language.
To tackle this issue, we propose an additional KL-divergence self-teaching loss for model training, based on auto-generated soft pseudo-labels for translated text in the target language.
arXiv Detail & Related papers (2020-09-10T22:42:15Z) - Hierarchical Multi Task Learning with Subword Contextual Embeddings for
Languages with Rich Morphology [5.5217350574838875]
Morphological information is important for many sequence labeling tasks in Natural Language Processing (NLP)
We propose using subword contextual embeddings to capture morphological information for languages with rich morphology.
Our model outperforms previous state-of-the-art models on both tasks for the Turkish language.
arXiv Detail & Related papers (2020-04-25T22:55:56Z) - From text saliency to linguistic objects: learning linguistic
interpretable markers with a multi-channels convolutional architecture [2.064612766965483]
We propose a novel approach to inspect the hidden layers of a fitted CNN in order to extract interpretable linguistic objects from texts exploiting classification process.
We empirically demonstrate the efficiency of our approach on corpora from two different languages: English and French.
arXiv Detail & Related papers (2020-04-07T10:46:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.