A Survey on Vietnamese Document Analysis and Recognition: Challenges and Future Directions
- URL: http://arxiv.org/abs/2506.05061v1
- Date: Thu, 05 Jun 2025 14:03:18 GMT
- Title: A Survey on Vietnamese Document Analysis and Recognition: Challenges and Future Directions
- Authors: Anh Le, Thanh Lam, Dung Nguyen,
- Abstract summary: Vietnamese document analysis and recognition (DAR) is a crucial field with applications in digitization, information retrieval, and automation.<n>Despite advancements in OCR and NLP, Vietnamese text recognition faces unique challenges due to its complex diacritics, tonal variations, and lack of large-scale annotated datasets.<n>Recently, large language models (LLMs) and vision-language models have demonstrated remarkable improvements in text recognition and document understanding.
- Score: 3.7994176460443208
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Vietnamese document analysis and recognition (DAR) is a crucial field with applications in digitization, information retrieval, and automation. Despite advancements in OCR and NLP, Vietnamese text recognition faces unique challenges due to its complex diacritics, tonal variations, and lack of large-scale annotated datasets. Traditional OCR methods often struggle with real-world document variations, while deep learning approaches have shown promise but remain limited by data scarcity and generalization issues. Recently, large language models (LLMs) and vision-language models have demonstrated remarkable improvements in text recognition and document understanding, offering a new direction for Vietnamese DAR. However, challenges such as domain adaptation, multimodal learning, and computational efficiency persist. This survey provide a comprehensive review of existing techniques in Vietnamese document recognition, highlights key limitations, and explores how LLMs can revolutionize the field. We discuss future research directions, including dataset development, model optimization, and the integration of multimodal approaches for improved document intelligence. By addressing these gaps, we aim to foster advancements in Vietnamese DAR and encourage community-driven solutions.
Related papers
- Editing Across Languages: A Survey of Multilingual Knowledge Editing [16.700978644147572]
This survey systematizes recent research on Multilingual Knowledge Editing (MKE)<n>MKE is a growing subdomain of model editing focused on ensuring factual edits generalize reliably across languages.<n>We present a comprehensive taxonomy of MKE methods, covering parameter-based, memory-based, fine-tuning, and hypernetwork approaches.
arXiv Detail & Related papers (2025-05-20T14:13:04Z) - Advancing Vietnamese Information Retrieval with Learning Objective and Benchmark [0.24999074238880487]
This work aims to provide the Vietnamese research community with a new benchmark for information retrieval.<n>We also present a new objective function based on the InfoNCE loss function, which is used to train our Vietnamese embedding model.
arXiv Detail & Related papers (2025-03-10T15:47:01Z) - OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation [59.53678957969471]
Multimodal Large Language Models (MLLMs) have made significant strides in visual understanding and generation tasks.<n> generating interleaved image-text content remains a challenge.<n>OpenING is a benchmark comprising 5,400 high-quality human-annotated instances across 56 real-world tasks.<n>IntJudge is a judge model for evaluating open-ended multimodal generation methods.
arXiv Detail & Related papers (2024-11-27T16:39:04Z) - Vietnamese Legal Information Retrieval in Question-Answering System [0.0]
Retrieval Augmented Generation (RAG) has gained significant recognition for enhancing the capabilities of large language models (LLMs)
However, RAG often fall short when applied to the Vietnamese language due to several challenges.
This report introduces our three main modifications taken to address these challenges.
arXiv Detail & Related papers (2024-09-05T02:34:05Z) - A Survey on Large Language Models with Multilingualism: Recent Advances and New Frontiers [51.8203871494146]
The rapid development of Large Language Models (LLMs) demonstrates remarkable multilingual capabilities in natural language processing.<n>Despite the breakthroughs of LLMs, the investigation into the multilingual scenario remains insufficient.<n>This survey aims to help the research community address multilingual problems and provide a comprehensive understanding of the core concepts, key techniques, and latest developments in multilingual natural language processing based on LLMs.
arXiv Detail & Related papers (2024-05-17T17:47:39Z) - Exploring Precision and Recall to assess the quality and diversity of LLMs [82.21278402856079]
We introduce a novel evaluation framework for Large Language Models (LLMs) such as textscLlama-2 and textscMistral.
This approach allows for a nuanced assessment of the quality and diversity of generated text without the need for aligned corpora.
arXiv Detail & Related papers (2024-02-16T13:53:26Z) - Recent Advances in Hate Speech Moderation: Multimodality and the Role of Large Models [52.24001776263608]
This comprehensive survey delves into the recent strides in HS moderation.
We highlight the burgeoning role of large language models (LLMs) and large multimodal models (LMMs)
We identify existing gaps in research, particularly in the context of underrepresented languages and cultures.
arXiv Detail & Related papers (2024-01-30T03:51:44Z) - Improving Vietnamese Legal Question--Answering System based on Automatic
Data Enrichment [2.56085064991751]
In this paper, we try to overcome these limitations by implementing a Vietnamese article-level retrieval-based legal QA system.
Our hypothesis is that in contexts where labeled data are limited, efficient data enrichment can help increase overall performance.
arXiv Detail & Related papers (2023-06-08T00:24:29Z) - OCRBench: On the Hidden Mystery of OCR in Large Multimodal Models [122.27878464009181]
We conducted a comprehensive evaluation of Large Multimodal Models, such as GPT4V and Gemini, in various text-related visual tasks.
OCRBench contains 29 datasets, making it the most comprehensive OCR evaluation benchmark available.
arXiv Detail & Related papers (2023-05-13T11:28:37Z) - Revealing Weaknesses of Vietnamese Language Models Through Unanswerable
Questions in Machine Reading Comprehension [2.7528170226206443]
We present a comprehensive analysis of language weaknesses and strengths of current Vietnamese monolingual models.
We also successfully reveal the existence of artifacts in Vietnamese Machine Reading benchmarks.
Our proposed modification helps improve the quality of unanswerable questions.
arXiv Detail & Related papers (2023-03-16T20:32:58Z) - Faithfulness in Natural Language Generation: A Systematic Survey of
Analysis, Evaluation and Optimization Methods [48.47413103662829]
Natural Language Generation (NLG) has made great progress in recent years due to the development of deep learning techniques such as pre-trained language models.
However, the faithfulness problem that the generated text usually contains unfaithful or non-factual information has become the biggest challenge.
arXiv Detail & Related papers (2022-03-10T08:28:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.