Related papers: Connecting the Dots: Inferring Patent Phrase Similarity with Retrieved Phrase Graphs

Connecting the Dots: Inferring Patent Phrase Similarity with Retrieved Phrase Graphs

URL: http://arxiv.org/abs/2403.16265v1
Date: Sun, 24 Mar 2024 18:59:38 GMT
Title: Connecting the Dots: Inferring Patent Phrase Similarity with Retrieved Phrase Graphs
Authors: Zhuoyi Peng, Yi Yang,
Abstract summary: We study the patent phrase similarity inference task, which measures the semantic similarity between two patent phrases. We introduce a graph-augmented approach to amplify the global contextual information of the patent phrases.
Score: 18.86788223751979
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study the patent phrase similarity inference task, which measures the semantic similarity between two patent phrases. As patent documents employ legal and highly technical language, existing semantic textual similarity methods that use localized contextual information do not perform satisfactorily in inferring patent phrase similarity. To address this, we introduce a graph-augmented approach to amplify the global contextual information of the patent phrases. For each patent phrase, we construct a phrase graph that links to its focal patents and a list of patents that are either cited by or cite these focal patents. The augmented phrase embedding is then derived from combining its localized contextual embedding with its global embedding within the phrase graph. We further propose a self-supervised learning objective that capitalizes on the retrieved topology to refine both the contextualized embedding and the graph parameters in an end-to-end manner. Experimental results from a unique patent phrase similarity dataset demonstrate that our approach significantly enhances the representation of patent phrases, resulting in marked improvements in similarity inference in a self-supervised fashion. Substantial improvements are also observed in the supervised setting, underscoring the potential benefits of leveraging retrieved phrase graph augmentation.

Related papers

A Hybrid Architecture with Efficient Fine Tuning for Abstractive Patent Document Summarization [0.0]
This study proposes a system for efficiently creating abstractive summaries of patent records. The procedure involves leveraging the LexRank graph-based algorithm to retrieve the important sentences from input parent texts.
arXiv Detail & Related papers (2025-03-13T13:30:54Z)
CopyJudge: Automated Copyright Infringement Identification and Mitigation in Text-to-Image Diffusion Models [58.58208005178676]
We propose CopyJudge, an automated copyright infringement identification framework. We employ an abstraction-filtration-comparison test framework with multi-LVLM debate to assess the likelihood of infringement. Based on the judgments, we introduce a general LVLM-based mitigation strategy.
arXiv Detail & Related papers (2025-02-21T08:09:07Z)
PatentEdits: Framing Patent Novelty as Textual Entailment [62.8514393375952]
We introduce the PatentEdits dataset, which contains 105K examples of successful revisions. We design algorithms to label edits sentence by sentence, then establish how well these edits can be predicted with large language models. We demonstrate that evaluating textual entailment between cited references and draft sentences is especially effective in predicting which inventive claims remained unchanged or are novel in relation to prior art.
arXiv Detail & Related papers (2024-11-20T17:23:40Z)
Gendered Words and Grant Rates: A Textual Analysis of Disparate Outcomes in the Patent System [1.53934570513443]
This article uses machine learning and natural language processing to extract hidden information from patent applications. We find that inventor gender can often be identified from textual attributes - even without knowing the inventor's name. Our study also investigates whether objective features of a patent application can predict if it will be granted.
arXiv Detail & Related papers (2024-11-13T11:20:09Z)
A comparative analysis of embedding models for patent similarity [0.0]
This paper makes two contributions to the field of text-based patent similarity. It compares the performance of different kinds of patent-specific pretrained embedding models.
arXiv Detail & Related papers (2024-03-25T11:20:23Z)
PaECTER: Patent-level Representation Learning using Citation-informed Transformers [0.16785092703248325]
PaECTER is a publicly available, open-source document-level encoder specific for patents. We fine-tune BERT for Patents with examiner-added citation information to generate numerical representations for patent documents. PaECTER performs better in similarity tasks than current state-of-the-art models used in the patent domain.
arXiv Detail & Related papers (2024-02-29T18:09:03Z)
Cobra Effect in Reference-Free Image Captioning Metrics [58.438648377314436]
A proliferation of reference-free methods, leveraging visual-language pre-trained models (VLMs), has emerged. In this paper, we study if there are any deficiencies in reference-free metrics. We employ GPT-4V as an evaluative tool to assess generated sentences and the result reveals that our approach achieves state-of-the-art (SOTA) performance.
arXiv Detail & Related papers (2024-02-18T12:36:23Z)
Unveiling Black-boxes: Explainable Deep Learning Models for Patent Classification [48.5140223214582]
State-of-the-art methods for multi-label patent classification rely on deep opaque neural networks (DNNs) We propose a novel deep explainable patent classification framework by introducing layer-wise relevance propagation (LRP) Considering the relevance score, we then generate explanations by visualizing relevant words for the predicted patent class.
arXiv Detail & Related papers (2023-10-31T14:11:37Z)
Adaptive Taxonomy Learning and Historical Patterns Modelling for Patent Classification [26.85734804493925]
We propose an integrated framework that comprehensively considers the information on patents for patent classification. We first present an IPC codes correlations learning module to derive their semantic representations. Finally, we combine the contextual information of patent texts that contains the semantics of IPC codes, and assignees' sequential preferences to make predictions.
arXiv Detail & Related papers (2023-08-10T07:02:24Z)
Towards Unsupervised Recognition of Token-level Semantic Differences in Related Documents [61.63208012250885]
We formulate recognizing semantic differences as a token-level regression task. We study three unsupervised approaches that rely on a masked language model. Our results show that an approach based on word alignment and sentence-level contrastive learning has a robust correlation to gold labels.
arXiv Detail & Related papers (2023-05-22T17:58:04Z)
A Novel Patent Similarity Measurement Methodology: Semantic Distance and Technological Distance [0.0]
Patent similarity analysis plays a crucial role in evaluating the risk of patent infringement. Recent advances in natural language processing technology offer a promising avenue for automating this process. We propose a hybrid methodology that takes into account similarity, measures the similarity between patents by considering the semantic similarity of patents.
arXiv Detail & Related papers (2023-03-23T07:55:31Z)
Keywords and Instances: A Hierarchical Contrastive Learning Framework Unifying Hybrid Granularities for Text Generation [59.01297461453444]
We propose a hierarchical contrastive learning mechanism, which can unify hybrid granularities semantic meaning in the input text. Experiments demonstrate that our model outperforms competitive baselines on paraphrasing, dialogue generation, and storytelling tasks.
arXiv Detail & Related papers (2022-05-26T13:26:03Z)
PatentMiner: Patent Vacancy Mining via Context-enhanced and Knowledge-guided Graph Attention [2.9290732102216452]
We propose a new patent vacancy prediction approach named PatentMiner to mine rich semantic knowledge and predict new potential patents. Patent knowledge graph over time (e.g. year) is constructed by carrying out named entity recognition and relation extrac-tion from patent documents. Common Neighbor Method (CNM), Graph Attention Networks (GAT) and Context-enhanced Graph Attention Networks (CGAT) are proposed to perform link prediction in the constructed knowledge graph.
arXiv Detail & Related papers (2021-07-10T17:34:57Z)
SmartPatch: Improving Handwritten Word Imitation with Patch Discriminators [67.54204685189255]
We propose SmartPatch, a new technique increasing the performance of current state-of-the-art methods. We combine the well-known patch loss with information gathered from the parallel trained handwritten text recognition system. This leads to a more enhanced local discriminator and results in more realistic and higher-quality generated handwritten words.
arXiv Detail & Related papers (2021-05-21T18:34:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.