Estimating the Performance of Entity Resolution Algorithms: Lessons
Learned Through PatentsView.org
- URL: http://arxiv.org/abs/2210.01230v2
- Date: Mon, 17 Apr 2023 21:38:23 GMT
- Title: Estimating the Performance of Entity Resolution Algorithms: Lessons
Learned Through PatentsView.org
- Authors: Olivier Binette, Sokhna A York, Emma Hickerson, Youngsoo Baek, Sarvo
Madhavan, Christina Jones
- Abstract summary: This paper introduces a novel evaluation methodology for entity resolution algorithms.
It is motivated by PatentsView.org, a U.S. Patents and Trademarks Office patent data exploration tool.
- Score: 3.8494315501944736
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper introduces a novel evaluation methodology for entity resolution
algorithms. It is motivated by PatentsView.org, a U.S. Patents and Trademarks
Office patent data exploration tool that disambiguates patent inventors using
an entity resolution algorithm. We provide a data collection methodology and
tailored performance estimators that account for sampling biases. Our approach
is simple, practical and principled -- key characteristics that allow us to
paint the first representative picture of PatentsView's disambiguation
performance. This approach is used to inform PatentsView's users of the
reliability of the data and to allow the comparison of competing disambiguation
algorithms.
Related papers
- Efficient Fairness-Performance Pareto Front Computation [51.558848491038916]
We show that optimal fair representations possess several useful structural properties.
We then show that these approxing problems can be solved efficiently via concave programming methods.
arXiv Detail & Related papers (2024-09-26T08:46:48Z) - Provable Optimization for Adversarial Fair Self-supervised Contrastive Learning [49.417414031031264]
This paper studies learning fair encoders in a self-supervised learning setting.
All data are unlabeled and only a small portion of them are annotated with sensitive attributes.
arXiv Detail & Related papers (2024-06-09T08:11:12Z) - Large Language Model Informed Patent Image Retrieval [0.0]
We propose a language-informed, distribution-aware multimodal approach to patent image feature learning.
Our proposed method achieves state-of-the-art or comparable performance in image-based patent retrieval with mAP +53.3%, Recall@10 +41.8%, and MRR@10 +51.9%.
arXiv Detail & Related papers (2024-04-30T08:45:16Z) - On the Universal Adversarial Perturbations for Efficient Data-free
Adversarial Detection [55.73320979733527]
We propose a data-agnostic adversarial detection framework, which induces different responses between normal and adversarial samples to UAPs.
Experimental results show that our method achieves competitive detection performance on various text classification tasks.
arXiv Detail & Related papers (2023-06-27T02:54:07Z) - A Novel Patent Similarity Measurement Methodology: Semantic Distance and
Technological Distance [0.0]
Patent similarity analysis plays a crucial role in evaluating the risk of patent infringement.
Recent advances in natural language processing technology offer a promising avenue for automating this process.
We propose a hybrid methodology that takes into account similarity, measures the similarity between patents by considering the semantic similarity of patents.
arXiv Detail & Related papers (2023-03-23T07:55:31Z) - Multivariate Systemic Risk Measures and Computation by Deep Learning
Algorithms [63.03966552670014]
We discuss the key related theoretical aspects, with a particular focus on the fairness properties of primal optima and associated risk allocations.
The algorithms we provide allow for learning primals, optima for the dual representation and corresponding fair risk allocations.
arXiv Detail & Related papers (2023-02-02T22:16:49Z) - Making Linear MDPs Practical via Contrastive Representation Learning [101.75885788118131]
It is common to address the curse of dimensionality in Markov decision processes (MDPs) by exploiting low-rank representations.
We consider an alternative definition of linear MDPs that automatically ensures normalization while allowing efficient representation learning.
We demonstrate superior performance over existing state-of-the-art model-based and model-free algorithms on several benchmarks.
arXiv Detail & Related papers (2022-07-14T18:18:02Z) - A Survey on Sentence Embedding Models Performance for Patent Analysis [0.0]
We propose a standard library and dataset for assessing the accuracy of embeddings models based on PatentSBERTa approach.
Results show PatentSBERTa, Bert-for-patents, and TF-IDF Weighted Word Embeddings have the best accuracy for computing sentence embeddings at the subclass level.
arXiv Detail & Related papers (2022-04-28T12:04:42Z) - Patent Sentiment Analysis to Highlight Patent Paragraphs [0.0]
Given a patent document, identifying distinct semantic annotations is an interesting research aspect.
In the process of manual patent analysis, to attain better readability, recognising the semantic information by marking paragraphs is in practice.
This work assist patent practitioners in highlighting semantic information automatically and aid to create a sustainable and efficient patent analysis using the aptitude of Machine Learning.
arXiv Detail & Related papers (2021-11-06T13:28:29Z) - Deep learning-based citation recommendation system for patents [5.376388266200792]
We present a novel dataset called PatentNet that includes textual information and metadata for approximately 110,000 patents from the Google Big Query service.
Compared with existing recommendation methods, the proposed benchmark method achieved a mean reciprocal rank of 0.2377 on the test set.
arXiv Detail & Related papers (2020-10-21T12:18:21Z) - Fairness by Learning Orthogonal Disentangled Representations [50.82638766862974]
We propose a novel disentanglement approach to invariant representation problem.
We enforce the meaningful representation to be agnostic to sensitive information by entropy.
The proposed approach is evaluated on five publicly available datasets.
arXiv Detail & Related papers (2020-03-12T11:09:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.