Related papers: Multilingual Large Language Model: A Survey of Resources, Taxonomy and Frontiers

Multilingual Large Language Model: A Survey of Resources, Taxonomy and Frontiers

URL: http://arxiv.org/abs/2404.04925v1
Date: Sun, 7 Apr 2024 11:52:44 GMT
Title: Multilingual Large Language Model: A Survey of Resources, Taxonomy and Frontiers
Authors: Libo Qin, Qiguang Chen, Yuhang Zhou, Zhi Chen, Yinghui Li, Lizi Liao, Min Li, Wanxiang Che, Philip S. Yu,
Abstract summary: We present a review and provide a unified perspective to summarize the recent progress as well as emerging trends in multilingual large language models (MLLMs) literature. We hope our work can provide the community with quick access and spur breakthrough research in MLLMs.
Score: 81.47046536073682
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multilingual Large Language Models are capable of using powerful Large Language Models to handle and respond to queries in multiple languages, which achieves remarkable success in multilingual natural language processing tasks. Despite these breakthroughs, there still remains a lack of a comprehensive survey to summarize existing approaches and recent developments in this field. To this end, in this paper, we present a thorough review and provide a unified perspective to summarize the recent progress as well as emerging trends in multilingual large language models (MLLMs) literature. The contributions of this paper can be summarized: (1) First survey: to our knowledge, we take the first step and present a thorough review in MLLMs research field according to multi-lingual alignment; (2) New taxonomy: we offer a new and unified perspective to summarize the current progress of MLLMs; (3) New frontiers: we highlight several emerging frontiers and discuss the corresponding challenges; (4) Abundant resources: we collect abundant open-source resources, including relevant papers, data corpora, and leaderboards. We hope our work can provide the community with quick access and spur breakthrough research in MLLMs.

Related papers

Evaluating Monolingual and Multilingual Large Language Models for Greek Question Answering: The DemosQA Benchmark [0.0]
Large Language Models (LLMs) have advanced the state-of-the-art across a wide range of tasks, including Question Answering (QA)<n>Recent advancements in Natural Language Processing and Deep Learning have enabled the development of Large Language Models (LLMs)
arXiv Detail & Related papers (2026-02-18T19:15:30Z)
MultiFinBen: Benchmarking Large Language Models for Multilingual and Multimodal Financial Application [118.63802040274999]
MultiFinBen is the first expert-annotated multilingual (five languages) and multimodal benchmark for evaluating LLMs in realistic financial contexts.<n>Financial reasoning tests cross-lingual evidence integration from filings and news, and financial OCR, which extracts structured text from scanned documents.<n> evaluating 21 leading LLMs shows that even frontier multimodal models like GPT-4o achieve only 46.01% overall, stronger on vision and audio but dropping sharply in multilingual settings.
arXiv Detail & Related papers (2025-06-16T22:01:49Z)
Multilingual Prompt Engineering in Large Language Models: A Survey Across NLP Tasks [0.351124620232225]
Large language models (LLMs) have demonstrated impressive performance across a wide range of Natural Language Processing (NLP) tasks.<n>However, ensuring their effectiveness across multiple languages presents unique challenges.<n> Multilingual prompt engineering has emerged as a key approach to enhance LLMs' capabilities in diverse linguistic settings.
arXiv Detail & Related papers (2025-05-16T19:59:17Z)
Think Carefully and Check Again! Meta-Generation Unlocking LLMs for Low-Resource Cross-Lingual Summarization [108.6908427615402]
Cross-lingual summarization ( CLS) aims to generate a summary for the source text in a different target language. Currently, instruction-tuned large language models (LLMs) excel at various English tasks. Recent studies have shown that LLMs' performance on CLS tasks remains unsatisfactory even with few-shot settings.
arXiv Detail & Related papers (2024-10-26T00:39:44Z)
GlobeSumm: A Challenging Benchmark Towards Unifying Multi-lingual, Cross-lingual and Multi-document News Summarization [33.37163476772722]
We aim to unify Multi-lingual, Cross-lingual and Multi-document Summarization into a novel task, i.e., MCMS, which encapsulates the real-world requirements all-in-one. We meticulously constructed the GLOBESUMM dataset by first collecting a wealth of multilingual news reports and restructuring them into event-centric format.
arXiv Detail & Related papers (2024-10-05T08:56:44Z)
A Survey of Large Language Models for European Languages [4.328283741894074]
Large Language Models (LLMs) have gained significant attention due to their high performance on a wide range of natural language tasks. We present an overview of LLM families, including LLaMA, PaLM, GPT, and MoE. We provide a comprehensive summary of common monolingual and multilingual datasets used for pretraining large language models.
arXiv Detail & Related papers (2024-08-27T13:10:05Z)
A Survey on Large Language Models with Multilingualism: Recent Advances and New Frontiers [48.314619377988436]
The rapid development of Large Language Models (LLMs) demonstrates remarkable multilingual capabilities in natural language processing. Despite the breakthroughs of LLMs, the investigation into the multilingual scenario remains insufficient. This survey aims to help the research community address multilingual problems and provide a comprehensive understanding of the core concepts, key techniques, and latest developments in multilingual natural language processing based on LLMs.
arXiv Detail & Related papers (2024-05-17T17:47:39Z)
A Survey on Multilingual Large Language Models: Corpora, Alignment, and Bias [5.104497013562654]
We present an overview of MLLMs, covering their evolution, key techniques, and multilingual capacities. We explore widely utilized multilingual corpora for MLLMs' training and multilingual datasets oriented for downstream tasks. We discuss bias on MLLMs including its category and evaluation metrics, and summarize the existing debiasing techniques.
arXiv Detail & Related papers (2024-04-01T05:13:56Z)
Large Language Models for Generative Information Extraction: A Survey [89.71273968283616]
Large Language Models (LLMs) have demonstrated remarkable capabilities in text understanding and generation. We present an extensive overview by categorizing these works in terms of various IE subtasks and techniques. We empirically analyze the most advanced methods and discover the emerging trend of IE tasks with LLMs.
arXiv Detail & Related papers (2023-12-29T14:25:22Z)
A Comprehensive Overview of Large Language Models [68.22178313875618]
Large Language Models (LLMs) have recently demonstrated remarkable capabilities in natural language processing tasks. This article provides an overview of the existing literature on a broad range of LLM-related concepts.
arXiv Detail & Related papers (2023-07-12T20:01:52Z)
A Survey on Multimodal Large Language Models [71.63375558033364]
Multimodal Large Language Model (MLLM) represented by GPT-4V has been a new rising research hotspot. This paper aims to trace and summarize the recent progress of MLLMs.
arXiv Detail & Related papers (2023-06-23T15:21:52Z)
Multilingual Multimodality: A Taxonomical Survey of Datasets, Techniques, Challenges and Opportunities [10.721189858694396]
We study the unification of multilingual and multimodal (MultiX) streams. We review the languages studied, gold or silver data with parallel annotations, and understand how these modalities and languages interact in modeling. We present an account of the modeling approaches along with their strengths and weaknesses to better understand what scenarios they can be used reliably.
arXiv Detail & Related papers (2022-10-30T21:46:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.