Beyond Traditional Algorithms: Leveraging LLMs for Accurate Cross-Border Entity Identification
- URL: http://arxiv.org/abs/2507.11086v1
- Date: Tue, 15 Jul 2025 08:28:24 GMT
- Title: Beyond Traditional Algorithms: Leveraging LLMs for Accurate Cross-Border Entity Identification
- Authors: Andres Azqueta-Gavaldón, Joaquin Ramos Cosgrove,
- Abstract summary: Cross-border financial activities have underscored the necessity of accurately identifying and classifying foreign entities.<n>This paper explores Large Language Models (LLMs) as a flexible alternative to traditional algorithms.<n>We evaluate traditional methods, Hugging Face-based LLMs, and interface-based LLMs (e.g., Microsoft Copilot, Alibaba's Qwen 2.5) using a dataset of 65 Portuguese company cases.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The growing prevalence of cross-border financial activities in global markets has underscored the necessity of accurately identifying and classifying foreign entities. This practice is essential within the Spanish financial system for ensuring robust risk management, regulatory adherence, and the prevention of financial misconduct. This process involves a labor-intensive entity-matching task, where entities need to be validated against available reference sources. Challenges arise from linguistic variations, special characters, outdated names, and changes in legal forms, complicating traditional matching algorithms like Jaccard, cosine, and Levenshtein distances. These methods struggle with contextual nuances and semantic relationships, leading to mismatches. To address these limitations, we explore Large Language Models (LLMs) as a flexible alternative. LLMs leverage extensive training to interpret context, handle abbreviations, and adapt to legal transitions. We evaluate traditional methods, Hugging Face-based LLMs, and interface-based LLMs (e.g., Microsoft Copilot, Alibaba's Qwen 2.5) using a dataset of 65 Portuguese company cases. Results show traditional methods achieve accuracies over 92% but suffer high false positive rates (20-40%). Interface-based LLMs outperform, achieving accuracies above 93%, F1 scores exceeding 96%, and lower false positives (40-80%).
Related papers
- Federated Learning-Enabled Hybrid Language Models for Communication-Efficient Token Transmission [87.68447072141402]
Hybrid Language Models (HLMs) combine the low-latency efficiency of Small Language Models (SLMs) on edge devices with the high accuracy of Large Language Models (LLMs) on centralized servers.<n>We propose FedHLM, a communication-efficient HLM framework that integrates uncertainty-aware inference with Federated Learning (FL)
arXiv Detail & Related papers (2025-06-30T02:56:11Z) - MetaFaith: Faithful Natural Language Uncertainty Expression in LLMs [35.6424858476337]
We introduce MetaFaith, a novel prompt-based calibration approach inspired by human metacognition.<n>We show that MetaFaith robustly improves faithful calibration across diverse models and task domains, enabling up to 61% improvement in faithfulness.
arXiv Detail & Related papers (2025-05-30T17:54:08Z) - Cross-Lingual Pitfalls: Automatic Probing Cross-Lingual Weakness of Multilingual Large Language Models [55.14276067678253]
This paper introduces a novel methodology for efficiently identifying inherent cross-lingual weaknesses in Large Language Models (LLMs)<n>We construct a new dataset of over 6,000 bilingual pairs across 16 languages using this methodology, demonstrating its effectiveness in revealing weaknesses even in state-of-the-art models.<n>Further experiments investigate the relationship between linguistic similarity and cross-lingual weaknesses, revealing that linguistically related languages share similar performance patterns.
arXiv Detail & Related papers (2025-05-24T12:31:27Z) - Communication-Efficient Hybrid Language Model via Uncertainty-Aware Opportunistic and Compressed Transmission [65.17811759381978]
Hybrid language model (HLM) generates draft tokens that are validated and corrected by a remote large language model (LLM)<n>We propose communication-efficient and uncertainty-aware HLM (CU-HLM)<n>We show that CU-HLM achieves up to 206$times$ higher token throughput by skipping 74.8% transmissions with 97.4% vocabulary compression, while maintaining 97.4% accuracy.
arXiv Detail & Related papers (2025-05-17T02:10:34Z) - Towards Robust Knowledge Representations in Multilingual LLMs for Equivalence and Inheritance based Consistent Reasoning [5.656040546546711]
Reasoning and linguistic skills form the cornerstone of human intelligence, facilitating problem-solving and decision-making.
Recent advances in Large Language Models (LLMs) have led to impressive linguistic capabilities and emergent reasoning behaviors, fueling widespread adoption across application domains.
We focus on evaluating whether LLMs have the requisite representations to reason using two foundational relationships: "equivalence" and "inheritance"
arXiv Detail & Related papers (2024-10-18T07:34:21Z) - Exploring Automatic Cryptographic API Misuse Detection in the Era of LLMs [60.32717556756674]
This paper introduces a systematic evaluation framework to assess Large Language Models in detecting cryptographic misuses.
Our in-depth analysis of 11,940 LLM-generated reports highlights that the inherent instabilities in LLMs can lead to over half of the reports being false positives.
The optimized approach achieves a remarkable detection rate of nearly 90%, surpassing traditional methods and uncovering previously unknown misuses in established benchmarks.
arXiv Detail & Related papers (2024-07-23T15:31:26Z) - Prefix Text as a Yarn: Eliciting Non-English Alignment in Foundation Language Model [50.339632513018934]
supervised fine-tuning (SFT) has been a straightforward approach for tailoring the output of foundation large language model (LLM) to specific preferences.
We critically examine this hypothesis within the scope of cross-lingual generation tasks.
We introduce a novel training-free alignment method named PreTTY, which employs minimal task-related prior tokens.
arXiv Detail & Related papers (2024-04-25T17:19:36Z) - General LLMs as Instructors for Domain-Specific LLMs: A Sequential Fusion Method to Integrate Extraction and Editing [12.017822691367705]
We introduce a Sequential Fusion method to integrate knowledge from complex contexts into Large Language Models (LLMs)
Using our method, domain-specific LLMs achieved a 71.7% accuracy (an average gain of 39.1%) in question-answering tasks.
These findings underscore the effectiveness and flexibility of our approach in FDoR-UL across various domains.
arXiv Detail & Related papers (2024-03-23T06:03:36Z) - CIF-Bench: A Chinese Instruction-Following Benchmark for Evaluating the Generalizability of Large Language Models [53.9835961434552]
We introduce the Chinese Instruction-Following Benchmark (CIF-Bench) to evaluate the generalizability of large language models (LLMs) to the Chinese language.
CIF-Bench comprises 150 tasks and 15,000 input-output pairs, developed by native speakers to test complex reasoning and Chinese cultural nuances.
To mitigate data contamination, we release only half of the dataset publicly, with the remainder kept private, and introduce diversified instructions to minimize score variance.
arXiv Detail & Related papers (2024-02-20T16:02:12Z) - On Leveraging Large Language Models for Enhancing Entity Resolution: A Cost-efficient Approach [7.996010840316654]
We propose an uncertainty reduction framework using Large Language Models (LLMs) to improve entity resolution results.
LLMs capitalize on their advanced linguistic capabilities and a pay-as-you-go'' model that provides significant advantages to those without extensive data science expertise.
We show that our method is efficient and effective, offering promising applications in real-world tasks.
arXiv Detail & Related papers (2024-01-07T09:06:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.