Transformer-Based Contextualized Language Models Joint with Neural Networks for Natural Language Inference in Vietnamese
- URL: http://arxiv.org/abs/2411.13407v2
- Date: Thu, 21 Nov 2024 02:27:38 GMT
- Title: Transformer-Based Contextualized Language Models Joint with Neural Networks for Natural Language Inference in Vietnamese
- Authors: Dat Van-Thanh Nguyen, Tin Van Huynh, Kiet Van Nguyen, Ngan Luu-Thuy Nguyen,
- Abstract summary: We conduct experiments using various combinations of contextualized language models (CLM) and neural networks.
We find that the joint approach of CLM and neural networks is simple yet capable of achieving high-quality performance.
- Score: 1.7457686843484872
- License:
- Abstract: Natural Language Inference (NLI) is a task within Natural Language Processing (NLP) that holds value for various AI applications. However, there have been limited studies on Natural Language Inference in Vietnamese that explore the concept of joint models. Therefore, we conducted experiments using various combinations of contextualized language models (CLM) and neural networks. We use CLM to create contextualized work presentations and use Neural Networks for classification. Furthermore, we have evaluated the strengths and weaknesses of each joint model and identified the model failure points in the Vietnamese context. The highest F1 score in this experiment, up to 82.78% in the benchmark dataset (ViNLI). By conducting experiments with various models, the most considerable size of the CLM is XLM-R (355M). That combination has consistently demonstrated superior performance compared to fine-tuning strong pre-trained language models like PhoBERT (+6.58%), mBERT (+19.08%), and XLM-R (+0.94%) in terms of F1-score. This article aims to introduce a novel approach or model that attains improved performance for Vietnamese NLI. Overall, we find that the joint approach of CLM and neural networks is simple yet capable of achieving high-quality performance, which makes it suitable for applications that require efficient resource utilization.
Related papers
- ViANLI: Adversarial Natural Language Inference for Vietnamese [1.907126872483548]
We introduce the adversarial NLI dataset to the NLP research community with the name ViANLI.
This data set contains more than 10K premise-hypothesis pairs.
The accuracy of the most powerful model on the test set only reached 48.4%.
arXiv Detail & Related papers (2024-06-25T16:58:19Z) - Evaluating Large Language Models Using Contrast Sets: An Experimental Approach [0.0]
We introduce an innovative technique for generating a contrast set for the Stanford Natural Language Inference dataset.
Our strategy involves the automated substitution of verbs, adverbs, and adjectives with their synonyms to preserve the original meaning of sentences.
This method aims to assess whether a model's performance is based on genuine language comprehension or simply on pattern recognition.
arXiv Detail & Related papers (2024-04-02T02:03:28Z) - Few-shot clinical entity recognition in English, French and Spanish: masked language models outperform generative model prompting [4.832840259029653]
Large language models (LLMs) have become the preferred solution for many natural language processing tasks.
This study aims to evaluate generative LLMs, employed through prompt engineering, for few-shot clinical NER.
We compare 13 auto-regressive models using prompting and 16 masked models using fine-tuning on 14 NER datasets covering English, French and Spanish.
While prompt-based auto-regressive models achieve competitive F1 for general NER, they are outperformed within the clinical domain by lighter biLSTM-CRF taggers based on masked models.
arXiv Detail & Related papers (2024-02-20T08:20:49Z) - Automatic Model Selection with Large Language Models for Reasoning [33.93807127935167]
Chain-of-Thought (CoT) and Program-Aided Language Models (PAL) represent two distinct reasoning methods.
We introduce a model selection method to combine the best of both worlds by employing a large language model.
Our proposed method demonstrates significant performance improvements across eight reasoning datasets.
arXiv Detail & Related papers (2023-05-23T17:57:59Z) - An Open Dataset and Model for Language Identification [84.15194457400253]
We present a LID model which achieves a macro-average F1 score of 0.93 and a false positive rate of 0.033 across 201 languages.
We make both the model and the dataset available to the research community.
arXiv Detail & Related papers (2023-05-23T08:43:42Z) - Improving Code Generation by Training with Natural Language Feedback [69.52985513422381]
We formalize an algorithm for learning from natural language feedback at training time instead, which we call learning from Language Feedback (ILF)
ILF requires only a small amount of human-written feedback during training and does not require the same feedback at test time, making it both user-friendly and sample-efficient.
We use ILF to improve a Codegen-Mono 6.1B model's pass@1 rate by 38% relative (and 10% absolute) on the Mostly Basic Python Problems (MBPP) benchmark.
arXiv Detail & Related papers (2023-03-28T16:15:31Z) - Dependency-based Mixture Language Models [53.152011258252315]
We introduce the Dependency-based Mixture Language Models.
In detail, we first train neural language models with a novel dependency modeling objective.
We then formulate the next-token probability by mixing the previous dependency modeling probability distributions with self-attention.
arXiv Detail & Related papers (2022-03-19T06:28:30Z) - WANLI: Worker and AI Collaboration for Natural Language Inference
Dataset Creation [101.00109827301235]
We introduce a novel paradigm for dataset creation based on human and machine collaboration.
We use dataset cartography to automatically identify examples that demonstrate challenging reasoning patterns, and instruct GPT-3 to compose new examples with similar patterns.
The resulting dataset, WANLI, consists of 108,357 natural language inference (NLI) examples that present unique empirical strengths.
arXiv Detail & Related papers (2022-01-16T03:13:49Z) - Mixed-Lingual Pre-training for Cross-lingual Summarization [54.4823498438831]
Cross-lingual Summarization aims at producing a summary in the target language for an article in the source language.
We propose a solution based on mixed-lingual pre-training that leverages both cross-lingual tasks like translation and monolingual tasks like masked language models.
Our model achieves an improvement of 2.82 (English to Chinese) and 1.15 (Chinese to English) ROUGE-1 scores over state-of-the-art results.
arXiv Detail & Related papers (2020-10-18T00:21:53Z) - A Simple and Efficient Ensemble Classifier Combining Multiple Neural
Network Models on Social Media Datasets in Vietnamese [2.7528170226206443]
This study aims to classify Vietnamese texts on social media from three different Vietnamese benchmark datasets.
Advanced deep learning models are used and optimized in this study, including CNN, LSTM, and their variants.
Our ensemble model achieves the best performance on all three datasets.
arXiv Detail & Related papers (2020-09-28T04:28:48Z) - Improving Massively Multilingual Neural Machine Translation and
Zero-Shot Translation [81.7786241489002]
Massively multilingual models for neural machine translation (NMT) are theoretically attractive, but often underperform bilingual models and deliver poor zero-shot translations.
We argue that multilingual NMT requires stronger modeling capacity to support language pairs with varying typological characteristics.
We propose random online backtranslation to enforce the translation of unseen training language pairs.
arXiv Detail & Related papers (2020-04-24T17:21:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.