Leveraging LLM For Synchronizing Information Across Multilingual Tables
- URL: http://arxiv.org/abs/2504.02559v2
- Date: Fri, 04 Apr 2025 19:18:32 GMT
- Title: Leveraging LLM For Synchronizing Information Across Multilingual Tables
- Authors: Siddharth Khincha, Tushar Kataria, Ankita Anand, Dan Roth, Vivek Gupta,
- Abstract summary: This paper explores large language models (LLMs) for multilingual information synchronization.<n>We introduce the Information Updation dataset, simulating the real-world process of updating outdated Wikipedia tables.<n>Our findings reveal that single-prompt approaches often produce suboptimal results, prompting us to introduce a task decomposition strategy.
- Score: 45.821452282988794
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The vast amount of online information today poses challenges for non-English speakers, as much of it is concentrated in high-resource languages such as English and French. Wikipedia reflects this imbalance, with content in low-resource languages frequently outdated or incomplete. Recent research has sought to improve cross-language synchronization of Wikipedia tables using rule-based methods. These approaches can be effective, but they struggle with complexity and generalization. This paper explores large language models (LLMs) for multilingual information synchronization, using zero-shot prompting as a scalable solution. We introduce the Information Updation dataset, simulating the real-world process of updating outdated Wikipedia tables, and evaluate LLM performance. Our findings reveal that single-prompt approaches often produce suboptimal results, prompting us to introduce a task decomposition strategy that enhances coherence and accuracy. Our proposed method outperforms existing baselines, particularly in Information Updation (1.79%) and Information Addition (20.58%), highlighting the model strength in dynamically updating and enriching data across architectures.
Related papers
- Enhancing Small Language Models for Cross-Lingual Generalized Zero-Shot Classification with Soft Prompt Tuning [8.408016670697068]
Zero-Shot Classification (ZSC) has become essential for enabling models to classify text into categories unseen during training.<n>We introduce RoSPrompt, a lightweight and data-efficient approach for training soft prompts that enhance cross-lingual ZSC.<n>We evaluate our approach on multiple multilingual PLMs covering 106 languages, demonstrating strong cross-lingual transfer performance and robust generalization capabilities.
arXiv Detail & Related papers (2025-03-25T09:00:25Z) - Edit Once, Update Everywhere: A Simple Framework for Cross-Lingual Knowledge Synchronization in LLMs [60.12222055772508]
We present a simple and practical state-of-the-art (SOTA) recipe Cross-Lingual Knowledge Democracy Edit (X-KDE)<n>X-KDE is designed to propagate knowledge from a dominant language to other languages effectively.<n>Experiments on the Bi-ZsRE and MzsRE benchmarks show that X-KDE significantly enhances cross-lingual performance.
arXiv Detail & Related papers (2025-02-20T15:32:31Z) - Enhancing Code Generation for Low-Resource Languages: No Silver Bullet [55.39571645315926]
Large Language Models (LLMs) rely on large and diverse datasets to learn syntax, semantics, and usage patterns of programming languages.<n>For low-resource languages, the limited availability of such data hampers the models' ability to generalize effectively.<n>We present an empirical study investigating the effectiveness of several approaches for boosting LLMs' performance on low-resource languages.
arXiv Detail & Related papers (2025-01-31T12:23:28Z) - Think Carefully and Check Again! Meta-Generation Unlocking LLMs for Low-Resource Cross-Lingual Summarization [108.6908427615402]
Cross-lingual summarization ( CLS) aims to generate a summary for the source text in a different target language.<n>Currently, instruction-tuned large language models (LLMs) excel at various English tasks.<n>Recent studies have shown that LLMs' performance on CLS tasks remains unsatisfactory even with few-shot settings.
arXiv Detail & Related papers (2024-10-26T00:39:44Z) - ConVerSum: A Contrastive Learning-based Approach for Data-Scarce Solution of Cross-Lingual Summarization Beyond Direct Equivalents [4.029675201787349]
Cross-lingual summarization is a sophisticated branch in Natural Language Processing.
There is no feasible solution for CLS when there is no available high-quality CLS data.
We propose a novel data-efficient approach, ConVerSum, for CLS leveraging the power of contrastive learning.
arXiv Detail & Related papers (2024-08-17T19:03:53Z) - Bridging the Language Gap: Dynamic Learning Strategies for Improving Multilingual Performance in LLMs [15.911445732909849]
Large language models (LLMs) have revolutionized various domains but still struggle with non-Latin scripts and low-resource languages.
We introduce a novel dynamic learning approach that optimize prompt strategy, embedding model, and LLM per query at runtime.
We show our approach results in 10-15% improvements in multilingual performance over pre-trained models and 4x gains compared to fine-tuned, language-specific models.
arXiv Detail & Related papers (2023-05-28T14:48:38Z) - Ensemble Transfer Learning for Multilingual Coreference Resolution [60.409789753164944]
A problem that frequently occurs when working with a non-English language is the scarcity of annotated training data.
We design a simple but effective ensemble-based framework that combines various transfer learning techniques.
We also propose a low-cost TL method that bootstraps coreference resolution models by utilizing Wikipedia anchor texts.
arXiv Detail & Related papers (2023-01-22T18:22:55Z) - Mixed-Lingual Pre-training for Cross-lingual Summarization [54.4823498438831]
Cross-lingual Summarization aims at producing a summary in the target language for an article in the source language.
We propose a solution based on mixed-lingual pre-training that leverages both cross-lingual tasks like translation and monolingual tasks like masked language models.
Our model achieves an improvement of 2.82 (English to Chinese) and 1.15 (Chinese to English) ROUGE-1 scores over state-of-the-art results.
arXiv Detail & Related papers (2020-10-18T00:21:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.