Patent Language Model Pretraining with ModernBERT
- URL: http://arxiv.org/abs/2509.14926v2
- Date: Tue, 21 Oct 2025 13:13:11 GMT
- Title: Patent Language Model Pretraining with ModernBERT
- Authors: Amirhossein Yousefiramandi, Ciaran Cooney,
- Abstract summary: We pretrain 3 domain-specific masked language models for patents using the ModernBERT architecture and a curated corpus of over 60 million patent records.<n>Our approach incorporates architectural optimizations, including FlashAttention, rotary embeddings, and GLU feed-forward layers.<n>Our model, ModernBERT-base-PT, consistently outperforms the general-purpose ModernBERT baseline on three out of four datasets and competitive performance with a baseline PatentBERT.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Transformer-based language models such as BERT have become foundational in NLP, yet their performance degrades in specialized domains like patents, which contain long, technical, and legally structured text. Prior approaches to patent NLP have primarily relied on fine-tuning general-purpose models or domain-adapted variants pretrained with limited data. In this work, we pretrain 3 domain-specific masked language models for patents, using the ModernBERT architecture and a curated corpus of over 60 million patent records. Our approach incorporates architectural optimizations, including FlashAttention, rotary embeddings, and GLU feed-forward layers. We evaluate our models on four downstream patent classification tasks. Our model, ModernBERT-base-PT, consistently outperforms the general-purpose ModernBERT baseline on three out of four datasets and achieves competitive performance with a baseline PatentBERT. Additional experiments with ModernBERT-base-VX and Mosaic-BERT-large demonstrate that scaling the model size and customizing the tokenizer further enhance performance on selected tasks. Notably, all ModernBERT variants retain substantially faster inference over - 3x that of PatentBERT - underscoring their suitability for time-sensitive applications. These results underscore the benefits of domain-specific pretraining and architectural improvements for patent-focused NLP tasks.
Related papers
- NAP-Tuning: Neural Augmented Prompt Tuning for Adversarially Robust Vision-Language Models [72.58372335140241]
Adversarial Prompt Tuning (AdvPT) introduced learnable text prompts to enhance adversarial robustness in Vision-Language Models (VLMs)<n>We present the Neural Augmentor framework for Multi-modal Adversarial Prompt Tuning (NAP-Tuning)<n>Our approach shows significant improvements over the strongest baselines under the challenging AutoAttack benchmark, outperforming them by 33.5% on ViT-B16 and 33.0% on ViT-B32 architectures.
arXiv Detail & Related papers (2025-06-15T03:34:23Z) - ModernBERT or DeBERTaV3? Examining Architecture and Data Influence on Transformer Encoder Models Performance [17.306542392779445]
transformer-encoder models like DeBERTaV3 and ModernBERT introduce architectural advancements aimed at improving efficiency and performance.<n>Although the authors of ModernBERT report improved performance over DeBERTaV3 on several benchmarks, the lack of disclosed training data and the absence of comparisons make it difficult to determine whether these gains are due to architectural improvements or differences in training data.
arXiv Detail & Related papers (2025-04-11T17:29:35Z) - NeoBERT: A Next-Generation BERT [9.256844523327192]
NeoBERT is a next-generation encoder that redefines the capabilities of bidirectional models.<n>We release all code, data, checkpoints, and training scripts to accelerate research and real-world adoption.
arXiv Detail & Related papers (2025-02-26T22:00:22Z) - PatentGPT: A Large Language Model for Intellectual Property [26.31216865513109]
Large language models (LLMs) have attracted significant attention due to their exceptional performance across a multitude of natural language process tasks.
However, the application of large language models in the Intellectual Property (IP) domain is challenging due to the strong need for specialized knowledge.
We present for the first time a low-cost, standardized procedure for training IP-oriented LLMs, meeting the unique requirements of the IP domain.
arXiv Detail & Related papers (2024-04-28T17:36:43Z) - oBERTa: Improving Sparse Transfer Learning via improved initialization,
distillation, and pruning regimes [82.99830498937729]
oBERTa is an easy-to-use set of language models for Natural Language Processing.
It allows NLP practitioners to obtain between 3.8 and 24.3 times faster models without expertise in model compression.
We explore the use of oBERTa on seven representative NLP tasks.
arXiv Detail & Related papers (2023-03-30T01:37:19Z) - Sparse*BERT: Sparse Models Generalize To New tasks and Domains [79.42527716035879]
This paper studies how models pruned using Gradual Unstructured Magnitude Pruning can transfer between domains and tasks.
We demonstrate that our general sparse model Sparse*BERT can become SparseBioBERT simply by pretraining the compressed architecture on unstructured biomedical text.
arXiv Detail & Related papers (2022-05-25T02:51:12Z) - Predicting Issue Types with seBERT [85.74803351913695]
seBERT is a model that was developed based on the BERT architecture, but trained from scratch with software engineering data.
We fine-tuned this model for the NLBSE challenge for the task of issue type prediction.
Our model dominates the baseline fastText for all three issue types in both recall and precisio to achieve an overall F1-score of 85.7%.
arXiv Detail & Related papers (2022-05-03T06:47:13Z) - AutoTinyBERT: Automatic Hyper-parameter Optimization for Efficient
Pre-trained Language Models [46.69439585453071]
We adopt the one-shot Neural Architecture Search (NAS) to automatically search architecture hyper- parameters.
Specifically, we design the techniques of one-shot learning and the search space to provide an adaptive and efficient development way of tiny PLMs.
We name our method AutoTinyBERT and evaluate its effectiveness on the GLUE and SQuAD benchmarks.
arXiv Detail & Related papers (2021-07-29T00:47:30Z) - AutoBERT-Zero: Evolving BERT Backbone from Scratch [94.89102524181986]
We propose an Operation-Priority Neural Architecture Search (OP-NAS) algorithm to automatically search for promising hybrid backbone architectures.
We optimize both the search algorithm and evaluation of candidate models to boost the efficiency of our proposed OP-NAS.
Experiments show that the searched architecture (named AutoBERT-Zero) significantly outperforms BERT and its variants of different model capacities in various downstream tasks.
arXiv Detail & Related papers (2021-07-15T16:46:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.