Nested Named Entity Recognition in Plasma Physics Research Articles
- URL: http://arxiv.org/abs/2602.11163v1
- Date: Sat, 17 Jan 2026 08:59:03 GMT
- Title: Nested Named Entity Recognition in Plasma Physics Research Articles
- Authors: Muhammad Haris, Hans Höft, Markus M. Becker, Markus Stocker,
- Abstract summary: We propose a lightweight approach to extract named entities from plasma physics research articles.<n>First, we annotate a plasma physics corpus with 16 classes specifically designed for the nested NER task.<n>Second, we evaluate an entity-specific model specialization approach, where independent BERT-CRF models are trained to recognize individual entity types in plasma physics text.
- Score: 1.6507722022407412
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Named Entity Recognition (NER) is an important task in natural language processing that aims to identify and extract key entities from unstructured text. We present a novel application of NER in plasma physics research articles and address the challenges of extracting specialized entities from scientific text in this domain. Research articles in plasma physics often contain highly complex and context-rich content that must be extracted to enable, e.g., advanced search. We propose a lightweight approach based on encoder-transformers and conditional random fields to extract (nested) named entities from plasma physics research articles. First, we annotate a plasma physics corpus with 16 classes specifically designed for the nested NER task. Second, we evaluate an entity-specific model specialization approach, where independent BERT-CRF models are trained to recognize individual entity types in plasma physics text. Third, we integrate an optimization process to systematically fine-tune hyperparameters and enhance model performance. Our work contributes to the advancement of entity recognition in plasma physics and also provides a foundation to support researchers in navigating and analyzing scientific literature.
Related papers
- ProPhy: Progressive Physical Alignment for Dynamic World Simulation [55.456455952212416]
ProPhy is a Progressive Physical Alignment Framework that enables explicit physics-aware conditioning and anisotropic generation.<n>We show that ProPhy produces more realistic, dynamic, and physically coherent results than existing state-of-the-art methods.
arXiv Detail & Related papers (2025-12-05T09:39:26Z) - Kinematify: Open-Vocabulary Synthesis of High-DoF Articulated Objects [59.51185639557874]
We introduce Kinematify, an automated framework that synthesizes articulated objects directly from arbitrary RGB images or textual descriptions.<n>Our method addresses two core challenges: (i) inferring kinematic topologies for high-DoF objects and (ii) estimating joint parameters from static geometry.
arXiv Detail & Related papers (2025-11-03T07:21:42Z) - Probing the Critical Point (CritPt) of AI Reasoning: a Frontier Physics Research Benchmark [49.42250115889234]
We present the first benchmark designed to test large language models (LLMs) on research-level reasoning tasks.<n>CritPt consists of 71 composite research challenges designed to simulate full-scale research projects at the entry level.<n>We find that while current state-of-the-art LLMs show early promise on isolated checkpoints, they remain far from being able to reliably solve full research-scale challenges.
arXiv Detail & Related papers (2025-09-30T17:34:03Z) - PhysGaia: A Physics-Aware Dataset of Multi-Body Interactions for Dynamic Novel View Synthesis [62.283499219361595]
PhysGaia is a physics-aware dataset specifically designed for Dynamic Novel View Synthesis (DyNVS)<n>Our dataset provides complex dynamic scenarios with rich interactions among multiple objects.<n>PhysGaia will significantly advance research in dynamic view synthesis, physics-based scene understanding, and deep learning models integrated with physical simulation.
arXiv Detail & Related papers (2025-06-03T12:19:18Z) - Physics-based AI methodology for Material Parameter Extraction from Optical Data [0.0]
The proposed model integrates classical optimization frameworks with a multi-scale object detection framework.<n>We validate and analyze its performance on simulated transmission spectra at terahertz and infrared frequencies.
arXiv Detail & Related papers (2025-03-11T08:49:45Z) - Large Physics Models: Towards a collaborative approach with Large Language Models and Foundation Models [8.320153035338418]
This paper explores ideas and provides a potential roadmap for the development and evaluation of physics-specific large-scale AI models.<n>These models, based on foundation models such as Large Language Models (LLMs) are tailored to address the demands of physics research.
arXiv Detail & Related papers (2025-01-09T17:11:22Z) - PhysBERT: A Text Embedding Model for Physics Scientific Literature [0.0]
In this work, we introduce PhysBERT, the first physics-specific text embedding model.
Pre-trained on a curated corpus of 1.2 million arXiv physics papers and fine-tuned with supervised data, PhysBERT outperforms leading general-purpose models on physics-specific tasks.
arXiv Detail & Related papers (2024-08-18T19:18:12Z) - INDUS: Effective and Efficient Language Models for Scientific Applications [8.653859684720231]
Large language models (LLMs) trained on general domain corpora showed remarkable results on natural language processing (NLP) tasks.
We developed INDUS, a comprehensive suite of LLMs tailored for the closely-related domains of Earth science, biology, physics, heliophysics, planetary sciences and astrophysics.
We show that our models outperform both general-purpose (RoBERTa) and domain-specific (SCIBERT) encoders on new tasks as well as existing tasks in the domains of interest.
arXiv Detail & Related papers (2024-05-17T12:15:07Z) - Nested Named Entity Recognition from Medical Texts: An Adaptive Shared
Network Architecture with Attentive CRF [53.55504611255664]
We propose a novel method, referred to as ASAC, to solve the dilemma caused by the nested phenomenon.
The proposed method contains two key modules: the adaptive shared (AS) part and the attentive conditional random field (ACRF) module.
Our model could learn better entity representations by capturing the implicit distinctions and relationships between different categories of entities.
arXiv Detail & Related papers (2022-11-09T09:23:56Z) - Nested Named Entity Recognition as Holistic Structure Parsing [92.8397338250383]
This work models the full nested NEs in a sentence as a holistic structure, then we propose a holistic structure parsing algorithm to disclose the entire NEs once for all.
Experiments show that our model yields promising results on widely-used benchmarks which approach or even achieve state-of-the-art.
arXiv Detail & Related papers (2022-04-17T12:48:20Z) - Method and Dataset Entity Mining in Scientific Literature: A CNN +
Bi-LSTM Model with Self-attention [21.93889297841459]
We propose a novel entity recognition model, called MDER, which is able to effectively extract the method and dataset entities from scientific papers.
We evaluate the proposed model on datasets constructed from the published papers of four research areas in computer science, i.e., NLP, CV, Data Mining and AI.
arXiv Detail & Related papers (2020-10-26T13:38:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.