Related papers: Crossing Linguistic Horizons: Finetuning and Comprehensive Evaluation of Vietnamese Large Language Models

Crossing Linguistic Horizons: Finetuning and Comprehensive Evaluation of Vietnamese Large Language Models

URL: http://arxiv.org/abs/2403.02715v2
Date: Sun, 26 May 2024 17:13:32 GMT
Title: Crossing Linguistic Horizons: Finetuning and Comprehensive Evaluation of Vietnamese Large Language Models
Authors: Sang T. Truong, Duc Q. Nguyen, Toan Nguyen, Dong D. Le, Nhi N. Truong, Tho Quan, Sanmi Koyejo,
Abstract summary: Open-sourced large language models (LLMs) exhibit limited effectiveness in processing Vietnamese. To mitigate these issues, we have finetuned LLMs specifically for Vietnamese. Our evaluation results reveal that the fine-tuned LLMs exhibit enhanced comprehension and generative capabilities in Vietnamese.
Score: 11.563813473794013
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Recent advancements in large language models (LLMs) have underscored their importance in the evolution of artificial intelligence. However, despite extensive pretraining on multilingual datasets, available open-sourced LLMs exhibit limited effectiveness in processing Vietnamese. The challenge is exacerbated by the absence of systematic benchmark datasets and metrics tailored for Vietnamese LLM evaluation. To mitigate these issues, we have finetuned LLMs specifically for Vietnamese and developed a comprehensive evaluation framework encompassing 10 common tasks and 31 metrics. Our evaluation results reveal that the fine-tuned LLMs exhibit enhanced comprehension and generative capabilities in Vietnamese. Moreover, our analysis indicates that models with more parameters can introduce more biases and uncalibrated outputs and the key factor influencing LLM performance is the quality of the training or fine-tuning datasets. These insights underscore the significance of meticulous fine-tuning with high-quality datasets in enhancing LLM performance.

Related papers

Large Language Models for Mental Health: A Multilingual Evaluation [17.886031066436292]
We evaluate proprietary and open-source Large Language Models (LLMs) on eight mental health datasets in various languages.<n>We compare LLM performance in zero-shot, few-shot, and fine-tuned settings against conventional NLP baselines.<n>We assess translation quality across language families and typologies to understand its influence on LLM performance.
arXiv Detail & Related papers (2026-02-02T18:34:53Z)
Accept or Deny? Evaluating LLM Fairness and Performance in Loan Approval across Table-to-Text Serialization Approaches [57.5863675268117]
Large Language Models (LLMs) are increasingly employed in high-stakes decision-making tasks, such as loan approvals.<n>We assess the performance and fairness of LLMs on serialized loan approval datasets from Ghana, Germany, and the United States.
arXiv Detail & Related papers (2025-08-29T10:51:41Z)
Thunder-LLM: Efficiently Adapting LLMs to Korean with Minimal Resources [5.341994281991984]
This paper presents methods to adapt an existing English-based LLM to Korean in a low-budget scenario.<n>We describe the entire end-to-end process: collecting Korean datasets, preprocessing the data, training the model, creating downstream benchmarks, and conducting evaluations.<n>Our new bilingual models, Thunder-LLM and Thunder-LLM-Ins, achieve superior Korean performance compared to state-of-the-art models while utilizing minimal data and computational resources.
arXiv Detail & Related papers (2025-06-18T17:33:51Z)
Bridging the Linguistic Divide: A Survey on Leveraging Large Language Models for Machine Translation [33.08089616645845]
The advent of Large Language Models (LLMs) has significantly reshaped the landscape of machine translation (MT) We analyze techniques such as few-shot prompting, cross-lingual transfer, and parameter-efficient fine-tuning that enable effective adaptation to under-resourced settings. We discuss persistent challenges such as hallucinations, evaluation inconsistencies, and inherited biases while also evaluating emerging LLM-driven metrics for translation quality.
arXiv Detail & Related papers (2025-04-02T17:26:40Z)
Evaluating Large Language Model Capability in Vietnamese Fact-Checking Data Generation [1.0173628293062005]
Large Language Models (LLMs) are being applied to a range of complex language tasks. This paper explores the use of LLMs for automatic data generation for the Vietnamese fact-checking task. We develop an automatic data construction process using simple prompt techniques and explore several methods to improve the quality of the generated data.
arXiv Detail & Related papers (2024-11-08T15:35:43Z)
LLM-Forest: Ensemble Learning of LLMs with Graph-Augmented Prompts for Data Imputation [50.375567142250446]
Large language models (LLMs), trained on vast corpora, have shown strong potential in data generation.<n>We propose a novel framework, LLM-Forest, which introduces a "forest" of few-shot prompt learning LLM "trees" with their outputs aggregated via confidence-based weighted voting.<n>This framework is established on a new concept of bipartite information graphs to identify high-quality relevant neighboring entries with both feature and value granularity.
arXiv Detail & Related papers (2024-10-28T20:42:46Z)
Evaluating Linguistic Capabilities of Multimodal LLMs in the Lens of Few-Shot Learning [15.919493497867567]
This study aims to evaluate the performance of Multimodal Large Language Models (MLLMs) on the VALSE benchmark. We conducted a comprehensive assessment of state-of-the-art MLLMs, varying in model size and pretraining datasets.
arXiv Detail & Related papers (2024-07-17T11:26:47Z)
SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning [70.21358720599821]
Large language models (LLMs) hold the promise of solving diverse tasks when provided with appropriate natural language prompts. We propose SELF-GUIDE, a multi-stage mechanism in which we synthesize task-specific input-output pairs from the student LLM. We report an absolute improvement of approximately 15% for classification tasks and 18% for generation tasks in the benchmark's metrics.
arXiv Detail & Related papers (2024-07-16T04:41:58Z)
Comprehensive Reassessment of Large-Scale Evaluation Outcomes in LLMs: A Multifaceted Statistical Approach [64.42462708687921]
Evaluations have revealed that factors such as scaling, training types, architectures and other factors profoundly impact the performance of LLMs. Our study embarks on a thorough re-examination of these LLMs, targeting the inadequacies in current evaluation methods. This includes the application of ANOVA, Tukey HSD tests, GAMM, and clustering technique.
arXiv Detail & Related papers (2024-03-22T14:47:35Z)
Unveiling the Generalization Power of Fine-Tuned Large Language Models [81.70754292058258]
We investigate whether fine-tuning affects the intrinsic generalization ability intrinsic to Large Language Models (LLMs) Our main findings reveal that models fine-tuned on generation and classification tasks exhibit dissimilar behaviors in generalizing to different domains and tasks. We observe that integrating the in-context learning strategy during fine-tuning on generation tasks can enhance the model's generalization ability.
arXiv Detail & Related papers (2024-03-14T08:18:59Z)
TEaR: Improving LLM-based Machine Translation with Systematic Self-Refinement [26.26493253161022]
Large Language Models (LLMs) have achieved impressive results in Machine Translation (MT) We introduce a systematic LLM-based self-refinement translation framework, named textbfTEaR.
arXiv Detail & Related papers (2024-02-26T07:58:12Z)
Supervised Knowledge Makes Large Language Models Better In-context Learners [94.89301696512776]
Large Language Models (LLMs) exhibit emerging in-context learning abilities through prompt engineering. The challenge of improving the generalizability and factuality of LLMs in natural language understanding and question answering remains under-explored. We propose a framework that enhances the reliability of LLMs as it: 1) generalizes out-of-distribution data, 2) elucidates how LLMs benefit from discriminative models, and 3) minimizes hallucinations in generative tasks.
arXiv Detail & Related papers (2023-12-26T07:24:46Z)
TRACE: A Comprehensive Benchmark for Continual Learning in Large Language Models [52.734140807634624]
Aligned large language models (LLMs) demonstrate exceptional capabilities in task-solving, following instructions, and ensuring safety. Existing continual learning benchmarks lack sufficient challenge for leading aligned LLMs. We introduce TRACE, a novel benchmark designed to evaluate continual learning in LLMs.
arXiv Detail & Related papers (2023-10-10T16:38:49Z)
Evaluating the Capability of Large-scale Language Models on Chinese Grammatical Error Correction Task [10.597024796304016]
Large-scale language models (LLMs) has shown remarkable capability in various of Natural Language Processing (NLP) tasks. This report explores the how large language models perform on Chinese grammatical error correction tasks.
arXiv Detail & Related papers (2023-07-08T13:10:59Z)
CARE-MI: Chinese Benchmark for Misinformation Evaluation in Maternity and Infant Care [14.326936563564171]
We present a benchmark, CARE-MI, for evaluating misinformation in large language models (LLMs) Our proposed benchmark fills the gap between the extensive usage of LLMs and the lack of datasets for assessing the misinformation generated by these models. Using our benchmark, we conduct extensive experiments and found that current Chinese LLMs are far from perfect in the topic of maternity and infant care.
arXiv Detail & Related papers (2023-07-04T03:34:19Z)
CMMLU: Measuring massive multitask language understanding in Chinese [133.70911295934746]
This paper introduces a comprehensive Chinese benchmark that covers various subjects, including natural science, social sciences, engineering, and humanities. CMMLU fills the gap in evaluating the knowledge and reasoning capabilities of large language models within the Chinese context.
arXiv Detail & Related papers (2023-06-15T15:49:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.