Cross-Document Cross-Lingual Natural Language Inference via RST-enhanced Graph Fusion and Interpretability Prediction
- URL: http://arxiv.org/abs/2504.12324v1
- Date: Fri, 11 Apr 2025 13:18:26 GMT
- Title: Cross-Document Cross-Lingual Natural Language Inference via RST-enhanced Graph Fusion and Interpretability Prediction
- Authors: Mengying Yuan, Wangzi Xuan, Fei Li,
- Abstract summary: Natural Language Inference (NLI) is a fundamental task in both natural language processing and information retrieval.<n>We propose a novel paradigm for CDCL-NLI that extends traditional NLI capabilities to multi-document, multilingual scenarios.<n>Our work sheds light on the study of NLI and will bring research interest on cross-document cross-lingual context understanding.
- Score: 7.888128236684232
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Natural Language Inference (NLI) is a fundamental task in both natural language processing and information retrieval. While NLI has developed many sub-directions such as sentence-level NLI, document-level NLI and cross-lingual NLI, Cross-Document Cross-Lingual NLI (CDCL-NLI) remains largely unexplored. In this paper, we propose a novel paradigm for CDCL-NLI that extends traditional NLI capabilities to multi-document, multilingual scenarios. To support this task, we construct a high-quality CDCL-NLI dataset including 1,110 instances and spanning 26 languages. To build a baseline for this task, we also propose an innovative method that integrates RST-enhanced graph fusion and interpretability prediction. Our method employs RST (Rhetorical Structure Theory) on RGAT (Relation-aware Graph Attention Network) for cross-document context modeling, coupled with a structure-aware semantic alignment mechanism based on lexical chains for cross-lingual understanding. For NLI interpretability, we develop an EDU-level attribution framework that generates extractive explanations. Extensive experiments demonstrate our approach's superior performance, achieving significant improvements over both traditional NLI models such as DocNLI and R2F, as well as LLMs like Llama3 and GPT-4o. Our work sheds light on the study of NLI and will bring research interest on cross-document cross-lingual context understanding, semantic retrieval and interpretability inference. Our dataset and code are available at \href{https://anonymous.4open.science/r/CDCL-NLI-637E/}{CDCL-NLI-Link for peer review}.
Related papers
- Pushing the boundary on Natural Language Inference [49.15148871877941]
Natural Language Inference (NLI) is a central task in natural language understanding with applications in fact-checking, question answering and information retrieval.
Despite its importance, current NLI systems heavily rely on learning with limiting artifacts and biases, inference and real-world applicability.
This work provides a framework for building robust NLI systems without sacrificing quality or real-world applicability.
arXiv Detail & Related papers (2025-04-25T14:20:57Z) - A Novel Cartography-Based Curriculum Learning Method Applied on RoNLI: The First Romanian Natural Language Inference Corpus [71.77214818319054]
Natural language inference is a proxy for natural language understanding.
There is no publicly available NLI corpus for the Romanian language.
We introduce the first Romanian NLI corpus (RoNLI) comprising 58K training sentence pairs.
arXiv Detail & Related papers (2024-05-20T08:41:15Z) - XNLIeu: a dataset for cross-lingual NLI in Basque [14.788692648660797]
In this paper, we expand XNLI to include Basque, a low-resource language that can greatly benefit from transfer-learning approaches.
The new dataset, dubbed XNLIeu, has been developed by first machine-translating the English XNLI corpus into Basque, followed by a manual post-edition step.
arXiv Detail & Related papers (2024-04-10T13:19:56Z) - Embracing Language Inclusivity and Diversity in CLIP through Continual
Language Learning [58.92843729869586]
Vision-language pre-trained models (VL-PTMs) have advanced multimodal research in recent years, but their mastery in a few languages like English restricts their applicability in broader communities.
We propose to extend VL-PTMs' language capacity by continual language learning (CLL), where a model needs to update its linguistic knowledge incrementally without suffering from catastrophic forgetting (CF)
We construct a CLL benchmark covering 36 languages based on MSCOCO and XM3600 datasets and then evaluate multilingual image-text retrieval performance.
arXiv Detail & Related papers (2024-01-30T17:14:05Z) - On Bilingual Lexicon Induction with Large Language Models [81.6546357879259]
We examine the potential of the latest generation of Large Language Models for the development of bilingual lexicons.
We study 1) zero-shot prompting for unsupervised BLI and 2) few-shot in-context prompting with a set of seed translation pairs.
Our work is the first to demonstrate strong BLI capabilities of text-to-text mLLMs.
arXiv Detail & Related papers (2023-10-21T12:43:27Z) - Improving Bilingual Lexicon Induction with Cross-Encoder Reranking [31.142790337451366]
We propose a novel semi-supervised post-hoc reranking method termed BLICEr (BLI with Cross-Encoder Reranking)
The key idea is to 'extract' cross-lingual lexical knowledge from mPLMs, and then combine it with the original CLWEs.
BLICEr establishes new results on two standard BLI benchmarks spanning a wide spectrum of diverse languages.
arXiv Detail & Related papers (2022-10-30T21:26:07Z) - Stretching Sentence-pair NLI Models to Reason over Long Documents and
Clusters [35.103851212995046]
Natural Language Inference (NLI) has been extensively studied by the NLP community as a framework for estimating the semantic relation between sentence pairs.
We explore the direct zero-shot applicability of NLI models to real applications, beyond the sentence-pair setting they were trained on.
We develop new aggregation methods to allow operating over full documents, reaching state-of-the-art performance on the ContractNLI dataset.
arXiv Detail & Related papers (2022-04-15T12:56:39Z) - DocNLI: A Large-scale Dataset for Document-level Natural Language
Inference [55.868482696821815]
Natural language inference (NLI) is formulated as a unified framework for solving various NLP problems.
This work presents DocNLI -- a newly-constructed large-scale dataset for document-level NLI.
arXiv Detail & Related papers (2021-06-17T13:02:26Z) - N-LTP: An Open-source Neural Language Technology Platform for Chinese [68.58732970171747]
textttN- is an open-source neural language technology platform supporting six fundamental Chinese NLP tasks.
textttN- adopts the multi-task framework by using a shared pre-trained model, which has the advantage of capturing the shared knowledge across relevant Chinese tasks.
arXiv Detail & Related papers (2020-09-24T11:45:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.