Creating a silver standard for patent simplification
- URL: http://arxiv.org/abs/2310.15689v1
- Date: Tue, 24 Oct 2023 10:00:56 GMT
- Title: Creating a silver standard for patent simplification
- Authors: Silvia Casola, Alberto Lavelli, Horacio Saggion
- Abstract summary: Patents are legal documents that aim at protecting inventions on the one hand and at making technical knowledge circulate on the other.
Their style -- a mix of legal, technical, and extremely vague language -- makes their content hard to access for humans and machines.
This paper proposes an approach to automatically simplify patent text through rephrasing.
- Score: 11.083371480030195
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Patents are legal documents that aim at protecting inventions on the one hand
and at making technical knowledge circulate on the other. Their complex style
-- a mix of legal, technical, and extremely vague language -- makes their
content hard to access for humans and machines and poses substantial challenges
to the information retrieval community. This paper proposes an approach to
automatically simplify patent text through rephrasing. Since no in-domain
parallel simplification data exist, we propose a method to automatically
generate a large-scale silver standard for patent sentences. To obtain
candidates, we use a general-domain paraphrasing system; however, the process
is error-prone and difficult to control. Thus, we pair it with proper filters
and construct a cleaner corpus that can successfully be used to train a
simplification system. Human evaluation of the synthetic silver corpus shows
that it is considered grammatical, adequate, and contains simple sentences.
Related papers
- Enhancing patent retrieval using automated patent summarization [1.067215284497015]
We present the application of recent extractive and abstractive summarization methods for generating concise, purpose-specific summaries of patent documents.<n> Experimental results show that summarization-based queries significantly improve prior-art retrieval effectiveness.
arXiv Detail & Related papers (2025-07-22T09:14:44Z) - In-Context Watermarks for Large Language Models [71.29952527565749]
In-Context Watermarking (ICW) embeds watermarks into generated text solely through prompt engineering.<n>We investigate four ICW strategies at different levels of granularity, each paired with a tailored detection method.<n>Our experiments validate the feasibility of ICW as a model-agnostic, practical watermarking approach.
arXiv Detail & Related papers (2025-05-22T17:24:51Z) - A Hybrid Architecture with Efficient Fine Tuning for Abstractive Patent Document Summarization [0.0]
This study proposes a system for efficiently creating abstractive summaries of patent records.
The procedure involves leveraging the LexRank graph-based algorithm to retrieve the important sentences from input parent texts.
arXiv Detail & Related papers (2025-03-13T13:30:54Z) - PatentEdits: Framing Patent Novelty as Textual Entailment [62.8514393375952]
We introduce the PatentEdits dataset, which contains 105K examples of successful revisions.
We design algorithms to label edits sentence by sentence, then establish how well these edits can be predicted with large language models.
We demonstrate that evaluating textual entailment between cited references and draft sentences is especially effective in predicting which inventive claims remained unchanged or are novel in relation to prior art.
arXiv Detail & Related papers (2024-11-20T17:23:40Z) - Connecting the Dots: Inferring Patent Phrase Similarity with Retrieved Phrase Graphs [18.86788223751979]
We study the patent phrase similarity inference task, which measures the semantic similarity between two patent phrases.
We introduce a graph-augmented approach to amplify the global contextual information of the patent phrases.
arXiv Detail & Related papers (2024-03-24T18:59:38Z) - LLM-based Extraction of Contradictions from Patents [0.0]
This paper goes one step further, as it presents a method to extract TRIZ contradictions from patent texts based on Prompt Engineering.
Our results show that "off-the-shelf" GPT-4 is a serious alternative to existing approaches.
arXiv Detail & Related papers (2024-03-21T09:36:36Z) - Elaborative Simplification as Implicit Questions Under Discussion [51.17933943734872]
This paper proposes to view elaborative simplification through the lens of the Question Under Discussion (QUD) framework.
We show that explicitly modeling QUD provides essential understanding of elaborative simplification and how the elaborations connect with the rest of the discourse.
arXiv Detail & Related papers (2023-05-17T17:26:16Z) - A Novel Patent Similarity Measurement Methodology: Semantic Distance and
Technological Distance [0.0]
Patent similarity analysis plays a crucial role in evaluating the risk of patent infringement.
Recent advances in natural language processing technology offer a promising avenue for automating this process.
We propose a hybrid methodology that takes into account similarity, measures the similarity between patents by considering the semantic similarity of patents.
arXiv Detail & Related papers (2023-03-23T07:55:31Z) - Multi label classification of Artificial Intelligence related patents
using Modified D2SBERT and Sentence Attention mechanism [0.0]
We present a method for classifying artificial intelligence-related patents published by the USPTO using natural language processing technique and deep learning methodology.
Our experiment result is highest performance compared to other deep learning methods.
arXiv Detail & Related papers (2023-03-03T12:27:24Z) - GERE: Generative Evidence Retrieval for Fact Verification [57.78768817972026]
We propose GERE, the first system that retrieves evidences in a generative fashion.
The experimental results on the FEVER dataset show that GERE achieves significant improvements over the state-of-the-art baselines.
arXiv Detail & Related papers (2022-04-12T03:49:35Z) - Tracing Text Provenance via Context-Aware Lexical Substitution [81.49359106648735]
We propose a natural language watermarking scheme based on context-aware lexical substitution.
Under both objective and subjective metrics, our watermarking scheme can well preserve the semantic integrity of original sentences.
arXiv Detail & Related papers (2021-12-15T04:27:33Z) - SmartPatch: Improving Handwritten Word Imitation with Patch
Discriminators [67.54204685189255]
We propose SmartPatch, a new technique increasing the performance of current state-of-the-art methods.
We combine the well-known patch loss with information gathered from the parallel trained handwritten text recognition system.
This leads to a more enhanced local discriminator and results in more realistic and higher-quality generated handwritten words.
arXiv Detail & Related papers (2021-05-21T18:34:21Z) - Controllable Text Simplification with Explicit Paraphrasing [88.02804405275785]
Text Simplification improves the readability of sentences through several rewriting transformations, such as lexical paraphrasing, deletion, and splitting.
Current simplification systems are predominantly sequence-to-sequence models that are trained end-to-end to perform all these operations simultaneously.
We propose a novel hybrid approach that leverages linguistically-motivated rules for splitting and deletion, and couples them with a neural paraphrasing model to produce varied rewriting styles.
arXiv Detail & Related papers (2020-10-21T13:44:40Z) - Identification, Tracking and Impact: Understanding the trade secret of
catchphrases [8.343482692350094]
We propose an unsupervised method for the extraction of catchphrases from the abstracts of patents granted by the U.S. Patent and Trademark Office.
Our proposed system achieves substantial improvement, both in terms of precision and recall, against state-of-the-art techniques.
arXiv Detail & Related papers (2020-07-20T06:11:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.