On the problem of entity matching and its application in automated
settlement of receivables
- URL: http://arxiv.org/abs/2205.10678v1
- Date: Sat, 21 May 2022 21:16:21 GMT
- Title: On the problem of entity matching and its application in automated
settlement of receivables
- Authors: Lukasz Czekaj, Tomasz Biegus, Robert Kitlowski, Stanislaw Raczynski,
Mateusz Olszewski, Jakub Dziedzic, Pawe{\l} Tomasik, Ryszard Kozera,
Alexander Prokopenya, Robert Olszewski
- Abstract summary: We consider setup, where base algorithm is used for preliminary ranking of matches.
We apply several novel methods to increase matching quality of base algorithm.
- Score: 47.187609203210705
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper covers automated settlement of receivables in non-governmental
organizations. We tackle the problem with entity matching techniques. We
consider setup, where base algorithm is used for preliminary ranking of
matches, then we apply several novel methods to increase matching quality of
base algorithm: score post processing, cascade model and chain model. The
methods presented here contribute to automated settlement of receivables,
entity matching and multilabel classification in open-world scenario. We
evaluate our approach on real world operational data which come from company
providing settlement of receivables as a service: proposed methods boost recall
from 78% (base model) to >90% at precision 99%.
Related papers
- Federated Automatic Latent Variable Selection in Multi-output Gaussian Processes [0.7366405857677227]
A common approach in MGPs to transfer knowledge across units involves gathering all data from each unit to a central server.
We propose a hierarchical model that places spike-and-slab priors on the coefficients of each latent process.
These priors help automatically select only needed latent processes by shrinking the coefficients of unnecessary ones to zero.
arXiv Detail & Related papers (2024-07-24T02:03:28Z) - Cohort Squeeze: Beyond a Single Communication Round per Cohort in Cross-Device Federated Learning [51.560590617691005]
We investigate whether it is possible to squeeze more juice" out of each cohort than what is possible in a single communication round.
Our approach leads to up to 74% reduction in the total communication cost needed to train a FL model in the cross-device setting.
arXiv Detail & Related papers (2024-06-03T08:48:49Z) - A Weighted K-Center Algorithm for Data Subset Selection [70.49696246526199]
Subset selection is a fundamental problem that can play a key role in identifying smaller portions of the training data.
We develop a novel factor 3-approximation algorithm to compute subsets based on the weighted sum of both k-center and uncertainty sampling objective functions.
arXiv Detail & Related papers (2023-12-17T04:41:07Z) - Addressing Budget Allocation and Revenue Allocation in Data Market
Environments Using an Adaptive Sampling Algorithm [14.206050847214652]
We introduce a new algorithm to solve budget allocation and revenue allocation problems simultaneously in linear time.
The new algorithm employs an adaptive sampling process that selects data from those providers who are contributing the most to the model.
We provide theoretical guarantees for the algorithm that show the budget is used efficiently and the properties of revenue allocation are similar to Shapley's.
arXiv Detail & Related papers (2023-06-05T02:28:19Z) - Quality-Based Conditional Processing in Multi-Biometrics: Application to
Sensor Interoperability [63.05238390013457]
We describe and evaluate the ATVS-UAM fusion approach submitted to the quality-based evaluation of the 2007 BioSecure Multimodal Evaluation Campaign.
Our approach is based on linear logistic regression, in which fused scores tend to be log-likelihood-ratios.
Results show that the proposed approach outperforms all the rule-based fusion schemes.
arXiv Detail & Related papers (2022-11-24T12:11:22Z) - Equality of Effort via Algorithmic Recourse [3.3517146652431378]
This paper proposes a method for measuring fairness through equality of effort by applying algorithmic recourse through minimal interventions.
We extend the existing definition of equality of effort and present an algorithm for its assessment via algorithmic recourse.
arXiv Detail & Related papers (2022-11-21T22:41:24Z) - Byzantine-Robust Online and Offline Distributed Reinforcement Learning [60.970950468309056]
We consider a distributed reinforcement learning setting where multiple agents explore the environment and communicate their experiences through a central server.
$alpha$-fraction of agents are adversarial and can report arbitrary fake information.
We seek to identify a near-optimal policy for the underlying Markov decision process in the presence of these adversarial agents.
arXiv Detail & Related papers (2022-06-01T00:44:53Z) - A Federated Data-Driven Evolutionary Algorithm for Expensive
Multi/Many-objective Optimization [11.92436948211501]
This paper proposes a federated data-driven evolutionary multi-objective/many-objective optimization algorithm.
We leverage federated learning for surrogate construction so that multiple clients collaboratively train a radial-basis-function-network as the global surrogate.
A new federated acquisition function is proposed for the central server to approximate the objective values using the global surrogate and estimate the uncertainty level of the approximated objective values.
arXiv Detail & Related papers (2021-06-22T22:33:24Z) - Clustering-based Automatic Construction of Legal Entity Knowledge Base
from Contracts [0.0]
We propose a clustering-based approach to automatically generate a reliable knowledge base of legal entities from given contracts.
The proposed method is robust to different types of errors brought by pre-processing such as OCR and NER.
Compared to the collected ground-truth data, our method is able to recall 84% of the knowledge.
arXiv Detail & Related papers (2020-11-18T17:51:27Z) - Towards Model-Agnostic Post-Hoc Adjustment for Balancing Ranking
Fairness and Algorithm Utility [54.179859639868646]
Bipartite ranking aims to learn a scoring function that ranks positive individuals higher than negative ones from labeled data.
There have been rising concerns on whether the learned scoring function can cause systematic disparity across different protected groups.
We propose a model post-processing framework for balancing them in the bipartite ranking scenario.
arXiv Detail & Related papers (2020-06-15T10:08:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.