Comparing Approaches to Automatic Summarization in Less-Resourced Languages
- URL: http://arxiv.org/abs/2512.24410v1
- Date: Tue, 30 Dec 2025 18:45:00 GMT
- Title: Comparing Approaches to Automatic Summarization in Less-Resourced Languages
- Authors: Chester Palen-Michel, Constantine Lignos,
- Abstract summary: Automatic text summarization has achieved high performance in high-resourced languages like English, but comparatively less attention has been given to summarization in less-resourced languages.<n>This work compares a variety of different approaches to summarization from zero-shot prompting of LLMs large and small to fine-tuning smaller models like mT5 with and without three data augmentation approaches and multilingual transfer.
- Score: 11.1979155324669
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Automatic text summarization has achieved high performance in high-resourced languages like English, but comparatively less attention has been given to summarization in less-resourced languages. This work compares a variety of different approaches to summarization from zero-shot prompting of LLMs large and small to fine-tuning smaller models like mT5 with and without three data augmentation approaches and multilingual transfer. We also explore an LLM translation pipeline approach, translating from the source language to English, summarizing and translating back. Evaluating with five different metrics, we find that there is variation across LLMs in their performance across similar parameter sizes, that our multilingual fine-tuned mT5 baseline outperforms most other approaches including zero-shot LLM performance for most metrics, and that LLM as judge may be less reliable on less-resourced languages.
Related papers
- It's All About In-Context Learning! Teaching Extremely Low-Resource Languages to LLMs [16.130133009174124]
Extremely low-resource languages, especially those written in rare scripts, remain largely unsupported by large language models (LLMs)<n>This paper delivers the first comprehensive analysis of whether LLMs can acquire such languages purely via in-context learning (ICL), with or without auxiliary alignment signals, and how these methods compare to parameter-efficient fine-tuning (PEFT)<n>We evaluate 20 under-represented languages across three state-of-the-art multilingual LLMs.
arXiv Detail & Related papers (2025-08-26T14:51:10Z) - Enhancing Code Generation for Low-Resource Languages: No Silver Bullet [55.39571645315926]
Large Language Models (LLMs) rely on large and diverse datasets to learn syntax, semantics, and usage patterns of programming languages.<n>For low-resource languages, the limited availability of such data hampers the models' ability to generalize effectively.<n>We present an empirical study investigating the effectiveness of several approaches for boosting LLMs' performance on low-resource languages.
arXiv Detail & Related papers (2025-01-31T12:23:28Z) - Think Carefully and Check Again! Meta-Generation Unlocking LLMs for Low-Resource Cross-Lingual Summarization [108.6908427615402]
Cross-lingual summarization ( CLS) aims to generate a summary for the source text in a different target language.<n>Currently, instruction-tuned large language models (LLMs) excel at various English tasks.<n>Recent studies have shown that LLMs' performance on CLS tasks remains unsatisfactory even with few-shot settings.
arXiv Detail & Related papers (2024-10-26T00:39:44Z) - What do Large Language Models Need for Machine Translation Evaluation? [12.42394213466485]
Large language models (LLMs) can achieve results comparable to fine-tuned multilingual pre-trained language models.
This paper explores what translation information, such as the source, reference, translation errors and annotation guidelines, is needed for LLMs to evaluate machine translation quality.
arXiv Detail & Related papers (2024-10-04T09:50:45Z) - Getting More from Less: Large Language Models are Good Spontaneous Multilingual Learners [67.85635044939836]
Large Language Models (LLMs) have shown impressive language capabilities.
In this work, we investigate the spontaneous multilingual alignment improvement of LLMs.
We find that LLMs instruction-tuned on the question translation data (i.e. without annotated answers) are able to encourage the alignment between English and a wide range of languages.
arXiv Detail & Related papers (2024-05-22T16:46:19Z) - Zero-Shot Cross-Lingual Reranking with Large Language Models for
Low-Resource Languages [51.301942056881146]
We investigate how large language models (LLMs) function as rerankers in cross-lingual information retrieval systems for African languages.
Our implementation covers English and four African languages (Hausa, Somali, Swahili, and Yoruba)
We examine cross-lingual reranking with queries in English and passages in the African languages.
arXiv Detail & Related papers (2023-12-26T18:38:54Z) - Democratizing LLMs for Low-Resource Languages by Leveraging their English Dominant Abilities with Linguistically-Diverse Prompts [75.33019401706188]
Large language models (LLMs) are known to effectively perform tasks by simply observing few exemplars.
We propose to assemble synthetic exemplars from a diverse set of high-resource languages to prompt the LLMs to translate from any language into English.
Our unsupervised prompting method performs on par with supervised few-shot learning in LLMs of different sizes for translations between English and 13 Indic and 21 African low-resource languages.
arXiv Detail & Related papers (2023-06-20T08:27:47Z) - Multilingual Machine Translation with Large Language Models: Empirical Results and Analysis [103.89753784762445]
Large language models (LLMs) have demonstrated remarkable potential in handling multilingual machine translation (MMT)
This paper systematically investigates the advantages and challenges of LLMs for MMT.
We thoroughly evaluate eight popular LLMs, including ChatGPT and GPT-4.
arXiv Detail & Related papers (2023-04-10T15:51:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.