Related papers: Pap2Pat: Towards Automated Paper-to-Patent Drafting using Chunk-based Outline-guided Generation

Pap2Pat: Towards Automated Paper-to-Patent Drafting using Chunk-based Outline-guided Generation

URL: http://arxiv.org/abs/2410.07009v1
Date: Wed, 9 Oct 2024 15:52:48 GMT
Title: Pap2Pat: Towards Automated Paper-to-Patent Drafting using Chunk-based Outline-guided Generation
Authors: Valentin Knappich, Simon Razniewski, Anna Hätty, Annemarie Friedrich,
Abstract summary: We present PAP2PAT, a new challenging benchmark of 1.8k patent-paper pairs with document outlines. Our experiments with current open-weight LLMs and outline-guided generation show that they can effectively use information from the paper but struggle with repetitions, likely due to the inherent repetitiveness of patent language.
Score: 13.242188189150987
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The patent domain is gaining attention in natural language processing research, offering practical applications in streamlining the patenting process and providing challenging benchmarks for large language models (LLMs). However, the generation of the description sections of patents, which constitute more than 90% of the patent document, has not been studied to date. We address this gap by introducing the task of outline-guided paper-to-patent generation, where an academic paper provides the technical specification of the invention and an outline conveys the desired patent structure. We present PAP2PAT, a new challenging benchmark of 1.8k patent-paper pairs with document outlines, collected using heuristics that reflect typical research lab practices. Our experiments with current open-weight LLMs and outline-guided chunk-based generation show that they can effectively use information from the paper but struggle with repetitions, likely due to the inherent repetitiveness of patent language. We release our data and code.

Related papers

LAQuer: Localized Attribution Queries in Content-grounded Generation [69.60308443863606]
Grounded text generation models often produce content that deviates from their source material, requiring user verification to ensure accuracy.<n>Existing attribution methods associate entire sentences with source documents, which can be overwhelming for users seeking to fact-check specific claims.<n>We introduce Localized Attribution Queries (LAQuer), a new task that localizes selected spans of generated output to their corresponding source spans, allowing fine-grained and user-directed attribution.
arXiv Detail & Related papers (2025-06-01T21:46:23Z)
AutoPatent: A Multi-Agent Framework for Automatic Patent Generation [16.862811929856313]
We introduce a novel and practical task known as Draft2Patent, along with its corresponding D2P benchmark, which challenges Large Language Models to generate full-length patents averaging 17K tokens based on initial drafts. We propose a multi-agent framework called AutoPatent which leverages the LLM-based planner agent, writer agents, and examiner agent with PGTree and RRAG to generate lengthy, intricate, and high-quality complete patent documents.
arXiv Detail & Related papers (2024-12-13T02:27:34Z)
PatentEdits: Framing Patent Novelty as Textual Entailment [62.8514393375952]
We introduce the PatentEdits dataset, which contains 105K examples of successful revisions. We design algorithms to label edits sentence by sentence, then establish how well these edits can be predicted with large language models. We demonstrate that evaluating textual entailment between cited references and draft sentences is especially effective in predicting which inventive claims remained unchanged or are novel in relation to prior art.
arXiv Detail & Related papers (2024-11-20T17:23:40Z)
BRIEF: Bridging Retrieval and Inference for Multi-hop Reasoning via Compression [91.23933111083389]
BRIEF (Bridging Retrieval and Inference through Evidence Fusion) is a lightweight approach that performs query-aware multi-hop reasoning. Based on our synthetic data built entirely by open-source models, BRIEF generates more concise summaries.
arXiv Detail & Related papers (2024-10-20T04:24:16Z)
O1 Replication Journey: A Strategic Progress Report -- Part 1 [52.062216849476776]
This paper introduces a pioneering approach to artificial intelligence research, embodied in our O1 Replication Journey. Our methodology addresses critical challenges in modern AI research, including the insularity of prolonged team-based projects. We propose the journey learning paradigm, which encourages models to learn not just shortcuts, but the complete exploration process.
arXiv Detail & Related papers (2024-10-08T15:13:01Z)
PATopics: An automatic framework to extract useful information from pharmaceutical patents documents [4.340983107526238]
PATopics is a framework specially designed to extract relevant information for Pharmaceutical patents. We extensively analyzed the framework using 4,832 pharmaceutical patents concerning 809 molecules patented by 478 companies.
arXiv Detail & Related papers (2024-08-12T19:18:51Z)
Natural Language Processing in Patents: A Survey [0.0]
Patents, encapsulating crucial technical and legal information, present a rich domain for natural language processing (NLP) applications. As NLP technologies evolve, large language models (LLMs) have demonstrated outstanding capabilities in general text processing and generation tasks. This paper aims to equip NLP researchers with the essential knowledge to navigate this complex domain efficiently.
arXiv Detail & Related papers (2024-03-06T23:17:16Z)
PaECTER: Patent-level Representation Learning using Citation-informed Transformers [0.16785092703248325]
PaECTER is a publicly available, open-source document-level encoder specific for patents. We fine-tune BERT for Patents with examiner-added citation information to generate numerical representations for patent documents. PaECTER performs better in similarity tasks than current state-of-the-art models used in the patent domain.
arXiv Detail & Related papers (2024-02-29T18:09:03Z)
LLM4EDA: Emerging Progress in Large Language Models for Electronic Design Automation [74.7163199054881]
Large Language Models (LLMs) have demonstrated their capability in context understanding, logic reasoning and answer generation. We present a systematic study on the application of LLMs in the EDA field. We highlight the future research direction, focusing on applying LLMs in logic synthesis, physical design, multi-modal feature extraction and alignment of circuits.
arXiv Detail & Related papers (2023-12-28T15:09:14Z)
Leveraging Large Language Models to Improve REST API Testing [51.284096009803406]
RESTGPT takes as input an API specification, extracts machine-interpretable rules, and generates example parameter values from natural-language descriptions in the specification. Our evaluations indicate that RESTGPT outperforms existing techniques in both rule extraction and value generation.
arXiv Detail & Related papers (2023-12-01T19:53:23Z)
Unveiling Black-boxes: Explainable Deep Learning Models for Patent Classification [48.5140223214582]
State-of-the-art methods for multi-label patent classification rely on deep opaque neural networks (DNNs) We propose a novel deep explainable patent classification framework by introducing layer-wise relevance propagation (LRP) Considering the relevance score, we then generate explanations by visualizing relevant words for the predicted patent class.
arXiv Detail & Related papers (2023-10-31T14:11:37Z)
PaperCard for Reporting Machine Assistance in Academic Writing [48.33722012818687]
ChatGPT, a question-answering system released by OpenAI in November 2022, has demonstrated a range of capabilities that could be utilised in producing academic papers. This raises critical questions surrounding the concept of authorship in academia. We propose a framework we name "PaperCard", a documentation for human authors to transparently declare the use of AI in their writing process.
arXiv Detail & Related papers (2023-10-07T14:28:04Z)
The Harvard USPTO Patent Dataset: A Large-Scale, Well-Structured, and Multi-Purpose Corpus of Patent Applications [8.110699646062384]
We introduce the Harvard USPTO Patent dataset (HUPD) With more than 4.5 million patent documents, HUPD is two to three times larger than comparable corpora. By providing each application's metadata along with all of its text fields, the dataset enables researchers to perform new sets of NLP tasks.
arXiv Detail & Related papers (2022-07-08T17:57:15Z)
A Survey on Sentence Embedding Models Performance for Patent Analysis [0.0]
We propose a standard library and dataset for assessing the accuracy of embeddings models based on PatentSBERTa approach. Results show PatentSBERTa, Bert-for-patents, and TF-IDF Weighted Word Embeddings have the best accuracy for computing sentence embeddings at the subclass level.
arXiv Detail & Related papers (2022-04-28T12:04:42Z)
Context-Tuning: Learning Contextualized Prompts for Natural Language Generation [52.835877179365525]
We propose a novel continuous prompting approach, called Context-Tuning, to fine-tuning PLMs for natural language generation. Firstly, the prompts are derived based on the input text, so that they can elicit useful knowledge from PLMs for generation. Secondly, to further enhance the relevance of the generated text to the inputs, we utilize continuous inverse prompting to refine the process of natural language generation.
arXiv Detail & Related papers (2022-01-21T12:35:28Z)
Patent Sentiment Analysis to Highlight Patent Paragraphs [0.0]
Given a patent document, identifying distinct semantic annotations is an interesting research aspect. In the process of manual patent analysis, to attain better readability, recognising the semantic information by marking paragraphs is in practice. This work assist patent practitioners in highlighting semantic information automatically and aid to create a sustainable and efficient patent analysis using the aptitude of Machine Learning.
arXiv Detail & Related papers (2021-11-06T13:28:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.