Related papers: Gendered Words and Grant Rates: A Textual Analysis of Disparate Outcomes in the Patent System

Gendered Words and Grant Rates: A Textual Analysis of Disparate Outcomes in the Patent System

URL: http://arxiv.org/abs/2411.08526v2
Date: Wed, 18 Dec 2024 17:24:00 GMT
Title: Gendered Words and Grant Rates: A Textual Analysis of Disparate Outcomes in the Patent System
Authors: Deborah Gerhardt, Miriam Marcowitz-Bitton, W. Michael Schuster, Avshalom Elmalech, Omri Suissa, Moshe Mash,
Abstract summary: This article uses machine learning and natural language processing to extract hidden information from patent applications.<n>We find that inventor gender can often be identified from textual attributes - even without knowing the inventor's name.<n>Our study also investigates whether objective features of a patent application can predict if it will be granted.
Score: 1.53934570513443
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Text is a vehicle to convey information that reflects the writer's linguistic style and communicative patterns. By studying these attributes, we can discover latent insights about the author and their underlying message. This article uses such an approach to better understand patent applications and their inventors. While prior research focuses on patent metadata, we employ machine learning and natural language processing to extract hidden information from the words in patent applications. Through these methods, we find that inventor gender can often be identified from textual attributes - even without knowing the inventor's name. This ability to discern gender through text suggests that anonymized patent examination - often proposed as a solution to mitigate disparities in patent grant rates - may not fully address gendered outcomes in securing a patent. Our study also investigates whether objective features of a patent application can predict if it will be granted. Using a classifier algorithm, we correctly predicted whether a patent was granted over 60% of the time. Further analysis emphasized that writing style - like vocabulary and sentence complexity - disproportionately influenced grant predictions relative to other attributes such as inventor gender and subject matter keywords. Lastly, we examine whether women disproportionately invent in technological areas with higher rejection rates. Using a clustering algorithm, applications were allocated into groups with related subject matter. We found that 85% of female-dominated clusters have abnormally high rejection rates, compared to only 45% for male-dominated groupings. These findings highlight complex interactions between textual choices, gender, and success in securing a patent. They also raise questions about whether current proposals will be sufficient to achieve gender equity and efficiency in the patent system.

Related papers

ExaGPT: Example-Based Machine-Generated Text Detection for Human Interpretability [62.285407189502216]
Detecting texts generated by Large Language Models (LLMs) could cause grave mistakes due to incorrect decisions. We introduce ExaGPT, an interpretable detection approach grounded in the human decision-making process. We show that ExaGPT massively outperforms prior powerful detectors by up to +40.9 points of accuracy at a false positive rate of 1%.
arXiv Detail & Related papers (2025-02-17T01:15:07Z)
PatentEdits: Framing Patent Novelty as Textual Entailment [62.8514393375952]
We introduce the PatentEdits dataset, which contains 105K examples of successful revisions. We design algorithms to label edits sentence by sentence, then establish how well these edits can be predicted with large language models. We demonstrate that evaluating textual entailment between cited references and draft sentences is especially effective in predicting which inventive claims remained unchanged or are novel in relation to prior art.
arXiv Detail & Related papers (2024-11-20T17:23:40Z)
Comparing Complex Concepts with Transformers: Matching Patent Claims Against Natural Language Text [0.0]
Key capability in managing patent applications or a patent portfolio is comparing claims to other text, e.g. a patent specification. We test two new LLM-based approaches and find that both provide substantially better performance than previously published values. The ability to match dense information from one domain against much more distributed information expressed in a different vocabulary may also be useful beyond the intellectual property space.
arXiv Detail & Related papers (2024-07-14T22:31:07Z)
Connecting the Dots: Inferring Patent Phrase Similarity with Retrieved Phrase Graphs [18.86788223751979]
We study the patent phrase similarity inference task, which measures the semantic similarity between two patent phrases. We introduce a graph-augmented approach to amplify the global contextual information of the patent phrases.
arXiv Detail & Related papers (2024-03-24T18:59:38Z)
Natural Language Processing in Patents: A Survey [0.0]
Patents, encapsulating crucial technical and legal information, present a rich domain for natural language processing (NLP) applications. As NLP technologies evolve, large language models (LLMs) have demonstrated outstanding capabilities in general text processing and generation tasks. This paper aims to equip NLP researchers with the essential knowledge to navigate this complex domain efficiently.
arXiv Detail & Related papers (2024-03-06T23:17:16Z)
PaECTER: Patent-level Representation Learning using Citation-informed Transformers [0.16785092703248325]
PaECTER is a publicly available, open-source document-level encoder specific for patents. We fine-tune BERT for Patents with examiner-added citation information to generate numerical representations for patent documents. PaECTER performs better in similarity tasks than current state-of-the-art models used in the patent domain.
arXiv Detail & Related papers (2024-02-29T18:09:03Z)
Unveiling Black-boxes: Explainable Deep Learning Models for Patent Classification [48.5140223214582]
State-of-the-art methods for multi-label patent classification rely on deep opaque neural networks (DNNs) We propose a novel deep explainable patent classification framework by introducing layer-wise relevance propagation (LRP) Considering the relevance score, we then generate explanations by visualizing relevant words for the predicted patent class.
arXiv Detail & Related papers (2023-10-31T14:11:37Z)
A Novel Patent Similarity Measurement Methodology: Semantic Distance and Technological Distance [0.0]
Patent similarity analysis plays a crucial role in evaluating the risk of patent infringement. Recent advances in natural language processing technology offer a promising avenue for automating this process. We propose a hybrid methodology that takes into account similarity, measures the similarity between patents by considering the semantic similarity of patents.
arXiv Detail & Related papers (2023-03-23T07:55:31Z)
Measuring Equality in Machine Learning Security Defenses: A Case Study in Speech Recognition [56.69875958980474]
This work considers approaches to defending learned systems and how security defenses result in performance inequities across different sub-populations. We find that many methods that have been proposed can cause direct harm, like false rejection and unequal benefits from robustness training. We present a comparison of equality between two rejection-based defenses: randomized smoothing and neural rejection, finding randomized smoothing more equitable due to the sampling mechanism for minority groups.
arXiv Detail & Related papers (2023-02-17T16:19:26Z)
Auditing Gender Presentation Differences in Text-to-Image Models [54.16959473093973]
We study how gender is presented differently in text-to-image models. By probing gender indicators in the input text, we quantify the frequency differences of presentation-centric attributes. We propose an automatic method to estimate such differences.
arXiv Detail & Related papers (2023-02-07T18:52:22Z)
A Survey on Sentence Embedding Models Performance for Patent Analysis [0.0]
We propose a standard library and dataset for assessing the accuracy of embeddings models based on PatentSBERTa approach. Results show PatentSBERTa, Bert-for-patents, and TF-IDF Weighted Word Embeddings have the best accuracy for computing sentence embeddings at the subclass level.
arXiv Detail & Related papers (2022-04-28T12:04:42Z)
They, Them, Theirs: Rewriting with Gender-Neutral English [56.14842450974887]
We perform a case study on the singular they, a common way to promote gender inclusion in English. We show how a model can be trained to produce gender-neutral English with 1% word error rate with no human-labeled data.
arXiv Detail & Related papers (2021-02-12T21:47:48Z)
Differentially Private and Fair Deep Learning: A Lagrangian Dual Approach [54.32266555843765]
This paper studies a model that protects the privacy of the individuals sensitive information while also allowing it to learn non-discriminatory predictors. The method relies on the notion of differential privacy and the use of Lagrangian duality to design neural networks that can accommodate fairness constraints.
arXiv Detail & Related papers (2020-09-26T10:50:33Z)
Gender Stereotype Reinforcement: Measuring the Gender Bias Conveyed by Ranking Algorithms [68.85295025020942]
We propose the Gender Stereotype Reinforcement (GSR) measure, which quantifies the tendency of a Search Engines to support gender stereotypes. GSR is the first specifically tailored measure for Information Retrieval, capable of quantifying representational harms.
arXiv Detail & Related papers (2020-09-02T20:45:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.