A Continuous Space Neural Language Model for Bengali Language
- URL: http://arxiv.org/abs/2001.05315v1
- Date: Sat, 11 Jan 2020 14:50:57 GMT
- Title: A Continuous Space Neural Language Model for Bengali Language
- Authors: Hemayet Ahmed Chowdhury, Md. Azizul Haque Imon, Anisur Rahman, Aisha
Khatun, Md. Saiful Islam
- Abstract summary: This paper proposes a continuous-space neural language model, or more specifically an ASGD weight dropped LSTM language model, along with techniques to efficiently train it for Bengali Language.
The proposed architecture outperforms its counterparts by achieving an inference perplexity as low as 51.2 on the held out data set for Bengali.
- Score: 0.4799822253865053
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Language models are generally employed to estimate the probability
distribution of various linguistic units, making them one of the fundamental
parts of natural language processing. Applications of language models include a
wide spectrum of tasks such as text summarization, translation and
classification. For a low resource language like Bengali, the research in this
area so far can be considered to be narrow at the very least, with some
traditional count based models being proposed. This paper attempts to address
the issue and proposes a continuous-space neural language model, or more
specifically an ASGD weight dropped LSTM language model, along with techniques
to efficiently train it for Bengali Language. The performance analysis with
some currently existing count based models illustrated in this paper also shows
that the proposed architecture outperforms its counterparts by achieving an
inference perplexity as low as 51.2 on the held out data set for Bengali.
Related papers
- LLMic: Romanian Foundation Language Model [76.09455151754062]
We present LLMic, a foundation language model designed specifically for the Romanian Language.
We show that fine-tuning LLMic for language translation after the initial pretraining phase outperforms existing solutions in English-to-Romanian translation tasks.
arXiv Detail & Related papers (2025-01-13T22:14:45Z) - QueEn: A Large Language Model for Quechua-English Translation [20.377876059048692]
We propose QueEn, a novel approach for Quechua-English translation that combines Retrieval-Augmented Generation (RAG) with parameter-efficient fine-tuning techniques.
Our approach substantially exceeds baseline models, with a BLEU score of 17.6 compared to 1.5 for standard GPT models.
arXiv Detail & Related papers (2024-12-06T17:04:21Z) - One Model is All You Need: ByT5-Sanskrit, a Unified Model for Sanskrit NLP Tasks [26.848664285007022]
ByT5-Sanskrit is designed for NLP applications involving the morphologically rich language Sanskrit.
It is easier to deploy and more robust to data not covered by external linguistic resources.
We show that our approach yields new best scores for lemmatization and dependency parsing of other morphologically rich languages.
arXiv Detail & Related papers (2024-09-20T22:02:26Z) - Evaluating Large Language Models on Controlled Generation Tasks [92.64781370921486]
We present an extensive analysis of various benchmarks including a sentence planning benchmark with different granularities.
After comparing large language models against state-of-the-start finetuned smaller models, we present a spectrum showing large language models falling behind, are comparable, or exceed the ability of smaller models.
arXiv Detail & Related papers (2023-10-23T03:48:24Z) - Multilingual Text Classification for Dravidian Languages [4.264592074410622]
We propose a multilingual text classification framework for the Dravidian languages.
On the one hand, the framework used the LaBSE pre-trained model as the base model.
On the other hand, in view of the problem that the model cannot well recognize and utilize the correlation among languages, we further proposed a language-specific representation module.
arXiv Detail & Related papers (2021-12-03T04:26:49Z) - Language Models are Few-shot Multilingual Learners [66.11011385895195]
We evaluate the multilingual skills of the GPT and T5 models in conducting multi-class classification on non-English languages.
We show that, given a few English examples as context, pre-trained language models can predict not only English test samples but also non-English ones.
arXiv Detail & Related papers (2021-09-16T03:08:22Z) - Towards Zero-shot Language Modeling [90.80124496312274]
We construct a neural model that is inductively biased towards learning human languages.
We infer this distribution from a sample of typologically diverse training languages.
We harness additional language-specific side information as distant supervision for held-out languages.
arXiv Detail & Related papers (2021-08-06T23:49:18Z) - Unsupervised Domain Adaptation of a Pretrained Cross-Lingual Language
Model [58.27176041092891]
Recent research indicates that pretraining cross-lingual language models on large-scale unlabeled texts yields significant performance improvements.
We propose a novel unsupervised feature decomposition method that can automatically extract domain-specific features from the entangled pretrained cross-lingual representations.
Our proposed model leverages mutual information estimation to decompose the representations computed by a cross-lingual model into domain-invariant and domain-specific parts.
arXiv Detail & Related papers (2020-11-23T16:00:42Z) - Indic-Transformers: An Analysis of Transformer Language Models for
Indian Languages [0.8155575318208631]
Language models based on the Transformer architecture have achieved state-of-the-art performance on a wide range of NLP tasks.
However, this performance is usually tested and reported on high-resource languages, like English, French, Spanish, and German.
Indian languages, on the other hand, are underrepresented in such benchmarks.
arXiv Detail & Related papers (2020-11-04T14:43:43Z) - Parsing with Multilingual BERT, a Small Corpus, and a Small Treebank [46.626315158735615]
Pretrained multilingual contextual representations have shown great success, but due to the limits of their pretraining data, their benefits do not apply equally to all language varieties.
This presents a challenge for language varieties unfamiliar to these models, whose labeled emphand unlabeled data is too limited to train a monolingual model effectively.
We propose the use of additional language-specific pretraining and vocabulary augmentation to adapt multilingual models to low-resource settings.
arXiv Detail & Related papers (2020-09-29T16:12:52Z) - XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning [68.57658225995966]
Cross-lingual Choice of Plausible Alternatives (XCOPA) is a typologically diverse multilingual dataset for causal commonsense reasoning in 11 languages.
We evaluate a range of state-of-the-art models on this novel dataset, revealing that the performance of current methods falls short compared to translation-based transfer.
arXiv Detail & Related papers (2020-05-01T12:22:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.