Related papers: Patent-CR: A Dataset for Patent Claim Revision

Patent-CR: A Dataset for Patent Claim Revision

URL: http://arxiv.org/abs/2412.02549v1
Date: Tue, 03 Dec 2024 16:43:42 GMT
Title: Patent-CR: A Dataset for Patent Claim Revision
Authors: Lekang Jiang, Pascal A Scherz, Stephan Goetz,
Abstract summary: This paper presents Patent-CR, the first dataset created for the patent claim revision task in English.<n>It includes both initial patent applications rejected by patent examiners and the final granted versions.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper presents Patent-CR, the first dataset created for the patent claim revision task in English. It includes both initial patent applications rejected by patent examiners and the final granted versions. Unlike normal text revision tasks that predominantly focus on enhancing sentence quality, such as grammar correction and coherence improvement, patent claim revision aims at ensuring the claims meet stringent legal criteria. These criteria are beyond novelty and inventiveness, including clarity of scope, technical accuracy, language precision, and legal robustness. We assess various large language models (LLMs) through professional human evaluation, including general LLMs with different sizes and architectures, text revision models, and domain-specific models. Our results indicate that LLMs often bring ineffective edits that deviate from the target revisions. In addition, domain-specific models and the method of fine-tuning show promising results. Notably, GPT-4 outperforms other tested LLMs, but further revisions are still necessary to reach the examination standard. Furthermore, we demonstrate the inconsistency between automated and human evaluation results, suggesting that GPT-4-based automated evaluation has the highest correlation with human judgment. This dataset, along with our preliminary empirical research, offers invaluable insights for further exploration in patent claim revision.

Related papers

CompassVerifier: A Unified and Robust Verifier for LLMs Evaluation and Outcome Reward [50.97588334916863]
We develop CompassVerifier, an accurate and robust lightweight verifier model for evaluation and outcome reward.<n>It demonstrates multi-domain competency spanning math, knowledge, and diverse reasoning tasks, with the capability to process various answer types.<n>We introduce VerifierBench benchmark comprising model outputs collected from multiple data sources, augmented through manual analysis of metaerror patterns to enhance CompassVerifier.
arXiv Detail & Related papers (2025-08-05T17:55:24Z)
PEDANTIC: A Dataset for the Automatic Examination of Definiteness in Patent Claims [13.242188189150987]
PEDANTIC is a dataset of 14k US patent claims annotated with reasons for indefiniteness.<n>A human validation study confirms the pipeline's accuracy in generating high-quality annotations.<n> PEDANTIC provides a valuable resource for patent AI researchers, enabling the development of advanced examination models.
arXiv Detail & Related papers (2025-05-27T15:34:39Z)
PatentScore: Multi-dimensional Evaluation of LLM-Generated Patent Claims [32.272839191711114]
We introduce PatentScore, a multi-dimensional evaluation framework for assessing LLM-generated patent claims.<n>Unlike general-purpose NLG metrics, PatentScore reflects patent-specific constraints and document structures, enabling evaluation beyond surface similarity.<n>We report a Pearson correlation of $r = 0.819$ with expert annotations, outperforming existing NLG metrics.
arXiv Detail & Related papers (2025-05-25T22:20:11Z)
Towards Better Evaluation for Generated Patent Claims [0.0]
We introduce Patent-CE, the first comprehensive benchmark for evaluating patent claims.<n>We also propose PatClaimEval, a novel multi-dimensional evaluation method specifically designed for patent claims.<n>This research provides the groundwork for more accurate evaluations of automated patent claim generation systems.
arXiv Detail & Related papers (2025-05-16T10:27:16Z)
Scoring Verifiers: Evaluating Synthetic Verification for Code and Reasoning [59.25951947621526]
We propose an approach which can transform existing coding benchmarks into scoring and ranking datasets to evaluate the effectiveness of synthetic verifiers. We release four new benchmarks (HE-R, HE-R+, MBPP-R, and MBPP-R+), and analyzed synthetic verification methods with standard, reasoning-based, and reward-based LLMs. Our experiments show that reasoning can significantly improve test case generation and that scaling the number of test cases enhances the verification accuracy.
arXiv Detail & Related papers (2025-02-19T15:32:11Z)
Can AI Examine Novelty of Patents?: Novelty Evaluation Based on the Correspondence between Patent Claim and Prior Art [5.655276956391884]
This paper introduces a novel challenge by evaluating the ability of large language models (LLMs) to assess patent novelty. We present the first dataset specifically designed for novelty evaluation, derived from real patent examination cases. Our study reveals that while classification models struggle to effectively assess novelty, generative models make predictions with a reasonable level of accuracy.
arXiv Detail & Related papers (2025-02-10T10:09:29Z)
Truth or Mirage? Towards End-to-End Factuality Evaluation with LLM-Oasis [78.07225438556203]
We introduce LLM-Oasis, the largest resource for training end-to-end factuality evaluators.<n>It is constructed by extracting claims from Wikipedia, falsifying a subset of these claims, and generating pairs of factual and unfactual texts.<n>We then rely on human annotators to both validate the quality of our dataset and to create a gold standard test set for factuality evaluation systems.
arXiv Detail & Related papers (2024-11-29T12:21:15Z)
PatentEdits: Framing Patent Novelty as Textual Entailment [62.8514393375952]
We introduce the PatentEdits dataset, which contains 105K examples of successful revisions. We design algorithms to label edits sentence by sentence, then establish how well these edits can be predicted with large language models. We demonstrate that evaluating textual entailment between cited references and draft sentences is especially effective in predicting which inventive claims remained unchanged or are novel in relation to prior art.
arXiv Detail & Related papers (2024-11-20T17:23:40Z)
Self-Calibrated Listwise Reranking with Large Language Models [137.6557607279876]
Large language models (LLMs) have been employed in reranking tasks through a sequence-to-sequence approach. This reranking paradigm requires a sliding window strategy to iteratively handle larger candidate sets. We propose a novel self-calibrated listwise reranking method, which aims to leverage LLMs to produce global relevance scores for ranking.
arXiv Detail & Related papers (2024-11-07T10:31:31Z)
ClaimBrush: A Novel Framework for Automated Patent Claim Refinement Based on Large Language Models [3.3427063846107825]
ClaimBrush is a novel framework for automated patent claim refinement that includes a dataset and a rewriting model. We constructed a dataset for training and evaluating patent claim rewriting models by collecting a large number of actual patent claim rewriting cases. Our proposed rewriting model outperformed baselines and zero-shot learning in state-of-the-art large language models.
arXiv Detail & Related papers (2024-10-08T00:20:54Z)
Can Large Language Models Generate High-quality Patent Claims? [0.7535435234862548]
Large language models (LLMs) have shown exceptional performance across various text generation tasks. This paper constructs a dataset to investigate the performance of current LLMs in patent claim generation.
arXiv Detail & Related papers (2024-06-27T18:07:40Z)
PatentEval: Understanding Errors in Patent Generation [9.981773213952994]
We introduce a comprehensive error typology specifically designed for evaluating two distinct tasks in machine-generated patent texts. We have also developed a benchmark, PatentEval, for systematically assessing language models in this context.
arXiv Detail & Related papers (2024-06-05T13:55:27Z)
CritiqueLLM: Towards an Informative Critique Generation Model for Evaluation of Large Language Model Generation [87.44350003888646]
Eval-Instruct can acquire pointwise grading critiques with pseudo references and revise these critiques via multi-path prompting. CritiqueLLM is empirically shown to outperform ChatGPT and all the open-source baselines.
arXiv Detail & Related papers (2023-11-30T16:52:42Z)
FactCHD: Benchmarking Fact-Conflicting Hallucination Detection [64.4610684475899]
FactCHD is a benchmark designed for the detection of fact-conflicting hallucinations from LLMs. FactCHD features a diverse dataset that spans various factuality patterns, including vanilla, multi-hop, comparison, and set operation. We introduce Truth-Triangulator that synthesizes reflective considerations by tool-enhanced ChatGPT and LoRA-tuning based on Llama2.
arXiv Detail & Related papers (2023-10-18T16:27:49Z)
HyPoradise: An Open Baseline for Generative Speech Recognition with Large Language Models [81.56455625624041]
We introduce the first open-source benchmark to utilize external large language models (LLMs) for ASR error correction. The proposed benchmark contains a novel dataset, HyPoradise (HP), encompassing more than 334,000 pairs of N-best hypotheses. LLMs with reasonable prompt and its generative capability can even correct those tokens that are missing in N-best list.
arXiv Detail & Related papers (2023-09-27T14:44:10Z)
WiCE: Real-World Entailment for Claims in Wikipedia [63.234352061821625]
We propose WiCE, a new fine-grained textual entailment dataset built on natural claim and evidence pairs extracted from Wikipedia. In addition to standard claim-level entailment, WiCE provides entailment judgments over sub-sentence units of the claim. We show that real claims in our dataset involve challenging verification and retrieval problems that existing models fail to address.
arXiv Detail & Related papers (2023-03-02T17:45:32Z)
A Survey on Sentence Embedding Models Performance for Patent Analysis [0.0]
We propose a standard library and dataset for assessing the accuracy of embeddings models based on PatentSBERTa approach. Results show PatentSBERTa, Bert-for-patents, and TF-IDF Weighted Word Embeddings have the best accuracy for computing sentence embeddings at the subclass level.
arXiv Detail & Related papers (2022-04-28T12:04:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.