Seeing Through Green: Text-Based Classification and the Firm's Returns from Green Patents
- URL: http://arxiv.org/abs/2507.02287v1
- Date: Thu, 03 Jul 2025 03:51:33 GMT
- Title: Seeing Through Green: Text-Based Classification and the Firm's Returns from Green Patents
- Authors: Lapo Santarlasci, Armando Rungi, Antonio Zinilli,
- Abstract summary: This paper introduces Natural Language Processing for identifying true'' green patents from official supporting documents.<n>We start our training on about 12.4 million patents that had been classified as green from previous literature.<n>We find that true'' green patents represent about 20% of the total of patents classified as green from previous literature.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: This paper introduces Natural Language Processing for identifying ``true'' green patents from official supporting documents. We start our training on about 12.4 million patents that had been classified as green from previous literature. Thus, we train a simple neural network to enlarge a baseline dictionary through vector representations of expressions related to environmental technologies. After testing, we find that ``true'' green patents represent about 20\% of the total of patents classified as green from previous literature. We show heterogeneity by technological classes, and then check that `true' green patents are about 1\% less cited by following inventions. In the second part of the paper, we test the relationship between patenting and a dashboard of firm-level financial accounts in the European Union. After controlling for reverse causality, we show that holding at least one ``true'' green patent raises sales, market shares, and productivity. If we restrict the analysis to high-novelty ``true'' green patents, we find that they also yield higher profits. Our findings underscore the importance of using text analyses to gauge finer-grained patent classifications that are useful for policymaking in different domains.
Related papers
- Red Teaming for Generative AI, Report on a Copyright-Focused Exercise Completed in an Academic Medical Center [49.85176045690678]
Generative artificial intelligence (AI) deployment in academic medical settings raises copyright compliance concerns.<n>Dana-Farber Cancer Institute implemented GPT4DFCI, an internal generative AI tool utilizing OpenAI models.<n>Four teams attempted to extract copyrighted content from GPT4DFCI across four domains.
arXiv Detail & Related papers (2025-06-26T23:11:49Z) - Incorporating Legal Structure in Retrieval-Augmented Generation: A Case Study on Copyright Fair Use [44.99833362998488]
This paper presents a domain-specific implementation of Retrieval-Augmented Generation tailored to the Fair Use Doctrine in U.S. copyright law.<n>Motivated by the increasing prevalence of DMCA takedowns and the lack of accessible legal support for content creators, we propose a structured approach that combines semantic search with legal knowledge graphs and court citation networks to improve retrieval quality and reasoning reliability.
arXiv Detail & Related papers (2025-05-04T15:53:49Z) - Imprinto: Enhancing Infrared Inkjet Watermarking for Human and Machine Perception [45.46101893448141]
Hybrid paper interfaces leverage augmented reality to combine the desired tangibility of paper documents with the affordances of interactive digital media.<n>We present Imprinto, an infrared inkjet watermarking technique that allows for invisible content embeddings only by using off-the-shelf IR inks and a camera.
arXiv Detail & Related papers (2025-02-24T12:11:33Z) - Intelligent System for Automated Molecular Patent Infringement Assessment [38.48937966447085]
PatentFinder is a novel multi-agent and tool-enhanced intelligence system that can accurately and comprehensively evaluate small molecules for patent infringement.<n>PatentFinder features five specialized agents that collaboratively analyze patent claims and molecular structures.<n>PatentFinder autonomously generates detailed and interpretable patent infringement reports, showcasing enhanced accuracy and improved interpretability.
arXiv Detail & Related papers (2024-12-10T12:14:38Z) - PatentEdits: Framing Patent Novelty as Textual Entailment [62.8514393375952]
We introduce the PatentEdits dataset, which contains 105K examples of successful revisions.
We design algorithms to label edits sentence by sentence, then establish how well these edits can be predicted with large language models.
We demonstrate that evaluating textual entailment between cited references and draft sentences is especially effective in predicting which inventive claims remained unchanged or are novel in relation to prior art.
arXiv Detail & Related papers (2024-11-20T17:23:40Z) - Pap2Pat: Benchmarking Outline-Guided Long-Text Patent Generation with Patent-Paper Pairs [13.242188189150987]
We build PAP2PAT, an open benchmark for patent drafting consisting of 1.8k patent-paper pairs describing the same inventions.<n>Our evaluation using PAP2PAT and a human case study show that LLMs can effectively leverage information from the paper, but still struggle to provide the necessary level of detail.
arXiv Detail & Related papers (2024-10-09T15:52:48Z) - Connecting the Dots: Inferring Patent Phrase Similarity with Retrieved Phrase Graphs [18.86788223751979]
We study the patent phrase similarity inference task, which measures the semantic similarity between two patent phrases.
We introduce a graph-augmented approach to amplify the global contextual information of the patent phrases.
arXiv Detail & Related papers (2024-03-24T18:59:38Z) - PaECTER: Patent-level Representation Learning using Citation-informed
Transformers [0.16785092703248325]
PaECTER is a publicly available, open-source document-level encoder specific for patents.
We fine-tune BERT for Patents with examiner-added citation information to generate numerical representations for patent documents.
PaECTER performs better in similarity tasks than current state-of-the-art models used in the patent domain.
arXiv Detail & Related papers (2024-02-29T18:09:03Z) - CausalCite: A Causal Formulation of Paper Citations [80.82622421055734]
CausalCite is a new way to measure the significance of a paper by assessing the causal impact of the paper on its follow-up papers.
It is based on a novel causal inference method, TextMatch, which adapts the traditional matching framework to high-dimensional text embeddings.
We demonstrate the effectiveness of CausalCite on various criteria, such as high correlation with paper impact as reported by scientific experts.
arXiv Detail & Related papers (2023-11-05T23:09:39Z) - Unveiling Black-boxes: Explainable Deep Learning Models for Patent
Classification [48.5140223214582]
State-of-the-art methods for multi-label patent classification rely on deep opaque neural networks (DNNs)
We propose a novel deep explainable patent classification framework by introducing layer-wise relevance propagation (LRP)
Considering the relevance score, we then generate explanations by visualizing relevant words for the predicted patent class.
arXiv Detail & Related papers (2023-10-31T14:11:37Z) - Adaptive Taxonomy Learning and Historical Patterns Modelling for Patent Classification [26.85734804493925]
We propose an integrated framework that comprehensively considers the information on patents for patent classification.
We first present an IPC codes correlations learning module to derive their semantic representations.
Finally, we combine the contextual information of patent texts that contains the semantics of IPC codes, and assignees' sequential preferences to make predictions.
arXiv Detail & Related papers (2023-08-10T07:02:24Z) - DPIC: Decoupling Prompt and Intrinsic Characteristics for LLM Generated Text Detection [56.513637720967566]
Large language models (LLMs) can generate texts that pose risks of misuse, such as plagiarism, planting fake reviews on e-commerce platforms, or creating inflammatory false tweets.
Existing high-quality detection methods usually require access to the interior of the model to extract the intrinsic characteristics.
We propose to extract deep intrinsic characteristics of the black-box model generated texts.
arXiv Detail & Related papers (2023-05-21T17:26:16Z) - Solar cell patent classification method based on keyword extraction and
deep neural network [0.0]
It can be said that the research value of solar cell patent analysis is very high.
Being able to accurately analyze and classify patent documents can reveal several important technical relationships.
Deep neural network-based solar cell patent classification model to classify power patents.
arXiv Detail & Related papers (2021-09-18T01:30:08Z) - Two tales of science technology linkage: Patent in-text versus
front-page references [8.731097761118972]
This paper explores how the value of a patent depends on the characteristics of the scientific papers that it builds on.
In-text referenced papers have a higher chance of being listed on the front-page of the same patent when they are moderately basic, less interdisciplinary, less novel, and more highly cited.
arXiv Detail & Related papers (2021-03-16T09:28:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.