ClaimCompare: A Data Pipeline for Evaluation of Novelty Destroying Patent Pairs
- URL: http://arxiv.org/abs/2407.12193v1
- Date: Tue, 16 Jul 2024 21:38:45 GMT
- Title: ClaimCompare: A Data Pipeline for Evaluation of Novelty Destroying Patent Pairs
- Authors: Arav Parikh, Shiri Dori-Hacohen,
- Abstract summary: We introduce a novel data pipeline, ClaimCompare, designed to generate labeled patent claim datasets suitable for training IR and ML models.
To the best of our knowledge, ClaimCompare is the first pipeline that can generate multiple novelty destroying patent datasets.
- Score: 2.60235825984014
- License:
- Abstract: A fundamental step in the patent application process is the determination of whether there exist prior patents that are novelty destroying. This step is routinely performed by both applicants and examiners, in order to assess the novelty of proposed inventions among the millions of applications filed annually. However, conducting this search is time and labor-intensive, as searchers must navigate complex legal and technical jargon while covering a large amount of legal claims. Automated approaches using information retrieval and machine learning approaches to detect novelty destroying patents present a promising avenue to streamline this process, yet research focusing on this space remains limited. In this paper, we introduce a novel data pipeline, ClaimCompare, designed to generate labeled patent claim datasets suitable for training IR and ML models to address this challenge of novelty destruction assessment. To the best of our knowledge, ClaimCompare is the first pipeline that can generate multiple novelty destroying patent datasets. To illustrate the practical relevance of this pipeline, we utilize it to construct a sample dataset comprising of over 27K patents in the electrochemical domain: 1,045 base patents from USPTO, each associated with 25 related patents labeled according to their novelty destruction towards the base patent. Subsequently, we conduct preliminary experiments showcasing the efficacy of this dataset in fine-tuning transformer models to identify novelty destroying patents, demonstrating 29.2% and 32.7% absolute improvement in MRR and P@1, respectively.
Related papers
- Dataset Protection via Watermarked Canaries in Retrieval-Augmented LLMs [67.0310240737424]
We introduce a novel approach to safeguard the ownership of text datasets and effectively detect unauthorized use by the RA-LLMs.
Our approach preserves the original data completely unchanged while protecting it by inserting specifically designed canary documents into the IP dataset.
During the detection process, unauthorized usage is identified by querying the canary documents and analyzing the responses of RA-LLMs.
arXiv Detail & Related papers (2025-02-15T04:56:45Z) - Can AI Examine Novelty of Patents?: Novelty Evaluation Based on the Correspondence between Patent Claim and Prior Art [5.655276956391884]
This paper introduces a novel challenge by evaluating the ability of large language models (LLMs) to assess patent novelty.
We present the first dataset specifically designed for novelty evaluation, derived from real patent examination cases.
Our study reveals that while classification models struggle to effectively assess novelty, generative models make predictions with a reasonable level of accuracy.
arXiv Detail & Related papers (2025-02-10T10:09:29Z) - Intelligent System for Automated Molecular Patent Infringement Assessment [38.48937966447085]
PatentFinder is a novel multi-agent and tool-enhanced intelligence system that can accurately and comprehensively evaluate small molecules for patent infringement.
PatentFinder features five specialized agents that collaboratively analyze patent claims and molecular structures.
PatentFinder autonomously generates detailed and interpretable patent infringement reports, showcasing enhanced accuracy and improved interpretability.
arXiv Detail & Related papers (2024-12-10T12:14:38Z) - CopyrightShield: Spatial Similarity Guided Backdoor Defense against Copyright Infringement in Diffusion Models [61.06621533874629]
diffusion model is a prime target for copyright infringement attacks.
This paper provides an in-depth analysis of the spatial similarity of replication in diffusion model.
We propose a novel defense method specifically targeting copyright infringement attacks.
arXiv Detail & Related papers (2024-12-02T14:19:44Z) - PatentEdits: Framing Patent Novelty as Textual Entailment [62.8514393375952]
We introduce the PatentEdits dataset, which contains 105K examples of successful revisions.
We design algorithms to label edits sentence by sentence, then establish how well these edits can be predicted with large language models.
We demonstrate that evaluating textual entailment between cited references and draft sentences is especially effective in predicting which inventive claims remained unchanged or are novel in relation to prior art.
arXiv Detail & Related papers (2024-11-20T17:23:40Z) - Structural Representation Learning and Disentanglement for Evidential Chinese Patent Approval Prediction [19.287231890434718]
This paper presents the pioneering effort on this task using a retrieval-based classification approach.
We propose a novel framework called DiSPat, which focuses on structural representation learning and disentanglement.
Our framework surpasses state-of-the-art baselines on patent approval prediction, while also exhibiting enhanced evidentiality.
arXiv Detail & Related papers (2024-08-23T05:44:16Z) - Randomization Techniques to Mitigate the Risk of Copyright Infringement [48.75580082851766]
We investigate potential randomization approaches that can complement current practices for copyright protection.
This is motivated by the inherent ambiguity of the rules that determine substantial similarity in copyright precedents.
Similar randomized approaches, such as differential privacy, have been successful in mitigating privacy risks.
arXiv Detail & Related papers (2024-08-21T20:55:00Z) - PaECTER: Patent-level Representation Learning using Citation-informed
Transformers [0.16785092703248325]
PaECTER is a publicly available, open-source document-level encoder specific for patents.
We fine-tune BERT for Patents with examiner-added citation information to generate numerical representations for patent documents.
PaECTER performs better in similarity tasks than current state-of-the-art models used in the patent domain.
arXiv Detail & Related papers (2024-02-29T18:09:03Z) - Unveiling Black-boxes: Explainable Deep Learning Models for Patent
Classification [48.5140223214582]
State-of-the-art methods for multi-label patent classification rely on deep opaque neural networks (DNNs)
We propose a novel deep explainable patent classification framework by introducing layer-wise relevance propagation (LRP)
Considering the relevance score, we then generate explanations by visualizing relevant words for the predicted patent class.
arXiv Detail & Related papers (2023-10-31T14:11:37Z) - A Survey on Sentence Embedding Models Performance for Patent Analysis [0.0]
We propose a standard library and dataset for assessing the accuracy of embeddings models based on PatentSBERTa approach.
Results show PatentSBERTa, Bert-for-patents, and TF-IDF Weighted Word Embeddings have the best accuracy for computing sentence embeddings at the subclass level.
arXiv Detail & Related papers (2022-04-28T12:04:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.