Related papers: Leveraging LLM For Synchronizing Information Across Multilingual Tables

Leveraging LLM For Synchronizing Information Across Multilingual Tables

URL: http://arxiv.org/abs/2504.02559v2
Date: Fri, 04 Apr 2025 19:18:32 GMT
Title: Leveraging LLM For Synchronizing Information Across Multilingual Tables
Authors: Siddharth Khincha, Tushar Kataria, Ankita Anand, Dan Roth, Vivek Gupta,
Abstract summary: This paper explores large language models (LLMs) for multilingual information synchronization.<n>We introduce the Information Updation dataset, simulating the real-world process of updating outdated Wikipedia tables.<n>Our findings reveal that single-prompt approaches often produce suboptimal results, prompting us to introduce a task decomposition strategy.
Score: 45.821452282988794
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The vast amount of online information today poses challenges for non-English speakers, as much of it is concentrated in high-resource languages such as English and French. Wikipedia reflects this imbalance, with content in low-resource languages frequently outdated or incomplete. Recent research has sought to improve cross-language synchronization of Wikipedia tables using rule-based methods. These approaches can be effective, but they struggle with complexity and generalization. This paper explores large language models (LLMs) for multilingual information synchronization, using zero-shot prompting as a scalable solution. We introduce the Information Updation dataset, simulating the real-world process of updating outdated Wikipedia tables, and evaluate LLM performance. Our findings reveal that single-prompt approaches often produce suboptimal results, prompting us to introduce a task decomposition strategy that enhances coherence and accuracy. Our proposed method outperforms existing baselines, particularly in Information Updation (1.79%) and Information Addition (20.58%), highlighting the model strength in dynamically updating and enriching data across architectures.

Related papers

Enhancing Small Language Models for Cross-Lingual Generalized Zero-Shot Classification with Soft Prompt Tuning [8.408016670697068]
Zero-Shot Classification (ZSC) has become essential for enabling models to classify text into categories unseen during training.<n>We introduce RoSPrompt, a lightweight and data-efficient approach for training soft prompts that enhance cross-lingual ZSC.<n>We evaluate our approach on multiple multilingual PLMs covering 106 languages, demonstrating strong cross-lingual transfer performance and robust generalization capabilities.
arXiv Detail & Related papers (2025-03-25T09:00:25Z)
Edit Once, Update Everywhere: A Simple Framework for Cross-Lingual Knowledge Synchronization in LLMs [60.12222055772508]
We present a simple and practical state-of-the-art (SOTA) recipe Cross-Lingual Knowledge Democracy Edit (X-KDE)<n>X-KDE is designed to propagate knowledge from a dominant language to other languages effectively.<n>Experiments on the Bi-ZsRE and MzsRE benchmarks show that X-KDE significantly enhances cross-lingual performance.
arXiv Detail & Related papers (2025-02-20T15:32:31Z)
Enhancing Code Generation for Low-Resource Languages: No Silver Bullet [55.39571645315926]
Large Language Models (LLMs) rely on large and diverse datasets to learn syntax, semantics, and usage patterns of programming languages.<n>For low-resource languages, the limited availability of such data hampers the models' ability to generalize effectively.<n>We present an empirical study investigating the effectiveness of several approaches for boosting LLMs' performance on low-resource languages.
arXiv Detail & Related papers (2025-01-31T12:23:28Z)
Think Carefully and Check Again! Meta-Generation Unlocking LLMs for Low-Resource Cross-Lingual Summarization [108.6908427615402]
Cross-lingual summarization ( CLS) aims to generate a summary for the source text in a different target language.<n>Currently, instruction-tuned large language models (LLMs) excel at various English tasks.<n>Recent studies have shown that LLMs' performance on CLS tasks remains unsatisfactory even with few-shot settings.
arXiv Detail & Related papers (2024-10-26T00:39:44Z)
ConVerSum: A Contrastive Learning-based Approach for Data-Scarce Solution of Cross-Lingual Summarization Beyond Direct Equivalents [4.029675201787349]
Cross-lingual summarization is a sophisticated branch in Natural Language Processing. There is no feasible solution for CLS when there is no available high-quality CLS data. We propose a novel data-efficient approach, ConVerSum, for CLS leveraging the power of contrastive learning.
arXiv Detail & Related papers (2024-08-17T19:03:53Z)
Bridging the Language Gap: Dynamic Learning Strategies for Improving Multilingual Performance in LLMs [15.911445732909849]
Large language models (LLMs) have revolutionized various domains but still struggle with non-Latin scripts and low-resource languages. We introduce a novel dynamic learning approach that optimize prompt strategy, embedding model, and LLM per query at runtime. We show our approach results in 10-15% improvements in multilingual performance over pre-trained models and 4x gains compared to fine-tuned, language-specific models.
arXiv Detail & Related papers (2023-05-28T14:48:38Z)
Ensemble Transfer Learning for Multilingual Coreference Resolution [60.409789753164944]
A problem that frequently occurs when working with a non-English language is the scarcity of annotated training data. We design a simple but effective ensemble-based framework that combines various transfer learning techniques. We also propose a low-cost TL method that bootstraps coreference resolution models by utilizing Wikipedia anchor texts.
arXiv Detail & Related papers (2023-01-22T18:22:55Z)
Mixed-Lingual Pre-training for Cross-lingual Summarization [54.4823498438831]
Cross-lingual Summarization aims at producing a summary in the target language for an article in the source language. We propose a solution based on mixed-lingual pre-training that leverages both cross-lingual tasks like translation and monolingual tasks like masked language models. Our model achieves an improvement of 2.82 (English to Chinese) and 1.15 (Chinese to English) ROUGE-1 scores over state-of-the-art results.
arXiv Detail & Related papers (2020-10-18T00:21:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.