PhysBERT: A Text Embedding Model for Physics Scientific Literature
- URL: http://arxiv.org/abs/2408.09574v1
- Date: Sun, 18 Aug 2024 19:18:12 GMT
- Title: PhysBERT: A Text Embedding Model for Physics Scientific Literature
- Authors: Thorsten Hellert, João Montenegro, Andrea Pollastro,
- Abstract summary: In this work, we introduce PhysBERT, the first physics-specific text embedding model.
Pre-trained on a curated corpus of 1.2 million arXiv physics papers and fine-tuned with supervised data, PhysBERT outperforms leading general-purpose models on physics-specific tasks.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The specialized language and complex concepts in physics pose significant challenges for information extraction through Natural Language Processing (NLP). Central to effective NLP applications is the text embedding model, which converts text into dense vector representations for efficient information retrieval and semantic analysis. In this work, we introduce PhysBERT, the first physics-specific text embedding model. Pre-trained on a curated corpus of 1.2 million arXiv physics papers and fine-tuned with supervised data, PhysBERT outperforms leading general-purpose models on physics-specific tasks including the effectiveness in fine-tuning for specific physics subdomains.
Related papers
- Astro-HEP-BERT: A bidirectional language model for studying the meanings of concepts in astrophysics and high energy physics [0.0]
The project demonstrates the effectiveness and feasibility of adapting a bidirectional transformer for applications in the history, philosophy, and sociology of science.
The entire training process was conducted using freely available code, pretrained weights, and text inputs, completed on a single MacBook Pro Laptop.
Preliminary evaluations indicate that Astro-HEP-BERT's CWEs perform comparably to domain-adapted BERT models trained from scratch on larger datasets.
arXiv Detail & Related papers (2024-11-22T11:59:15Z) - Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation [51.750634349748736]
Text-to-video (T2V) models have made significant strides in visualizing complex prompts.
However, the capacity of these models to accurately represent intuitive physics remains largely unexplored.
We introduce PhyGenBench to evaluate physical commonsense correctness in T2V generation.
arXiv Detail & Related papers (2024-10-07T17:56:04Z) - Transport-Embedded Neural Architecture: Redefining the Landscape of physics aware neural models in fluid mechanics [0.0]
A physical problem, the Taylor-Green vortex, defined on a bi-periodic domain, is used as a benchmark to evaluate the performance of both the standard physics-informed neural network and our model.
Results exhibit that while the standard physics-informed neural network fails to predict the solution accurately and merely returns the initial condition for the entire time span, our model successfully captures the temporal changes in the physics.
arXiv Detail & Related papers (2024-10-05T10:32:51Z) - Latent Intuitive Physics: Learning to Transfer Hidden Physics from A 3D Video [58.043569985784806]
We introduce latent intuitive physics, a transfer learning framework for physics simulation.
It can infer hidden properties of fluids from a single 3D video and simulate the observed fluid in novel scenes.
We validate our model in three ways: (i) novel scene simulation with the learned visual-world physics, (ii) future prediction of the observed fluid dynamics, and (iii) supervised particle simulation.
arXiv Detail & Related papers (2024-06-18T16:37:44Z) - ContPhy: Continuum Physical Concept Learning and Reasoning from Videos [86.63174804149216]
ContPhy is a novel benchmark for assessing machine physical commonsense.
We evaluated a range of AI models and found that they still struggle to achieve satisfactory performance on ContPhy.
We also introduce an oracle model (ContPRO) that marries the particle-based physical dynamic models with the recent large language models.
arXiv Detail & Related papers (2024-02-09T01:09:21Z) - Human Trajectory Prediction via Neural Social Physics [63.62824628085961]
Trajectory prediction has been widely pursued in many fields, and many model-based and model-free methods have been explored.
We propose a new method combining both methodologies based on a new Neural Differential Equation model.
Our new model (Neural Social Physics or NSP) is a deep neural network within which we use an explicit physics model with learnable parameters.
arXiv Detail & Related papers (2022-07-21T12:11:18Z) - Calibrating constitutive models with full-field data via physics
informed neural networks [0.0]
We propose a physics-informed deep-learning framework for the discovery of model parameterizations given full-field displacement data.
We work with the weak form of the governing equations rather than the strong form to impose physical constraints upon the neural network predictions.
We demonstrate that informed machine learning is an enabling technology and may shift the paradigm of how full-field experimental data is utilized to calibrate models.
arXiv Detail & Related papers (2022-03-30T18:07:44Z) - Physically Explainable CNN for SAR Image Classification [59.63879146724284]
In this paper, we propose a novel physics guided and injected neural network for SAR image classification.
The proposed framework comprises three parts: (1) generating physics guided signals using existing explainable models, (2) learning physics-aware features with physics guided network, and (3) injecting the physics-aware features adaptively to the conventional classification deep learning model for prediction.
The experimental results show that our proposed method substantially improve the classification performance compared with the counterpart data-driven CNN.
arXiv Detail & Related papers (2021-10-27T03:30:18Z) - Physics-Integrated Variational Autoencoders for Robust and Interpretable
Generative Modeling [86.9726984929758]
We focus on the integration of incomplete physics models into deep generative models.
We propose a VAE architecture in which a part of the latent space is grounded by physics.
We demonstrate generative performance improvements over a set of synthetic and real-world datasets.
arXiv Detail & Related papers (2021-02-25T20:28:52Z) - Physics-Guided Machine Learning for Scientific Discovery: An Application
in Simulating Lake Temperature Profiles [8.689056739160593]
This paper proposes a physics-guided recurrent neural network model (PGRNN) that combines RNNs and physics-based models.
We show that a PGRNN can improve prediction accuracy over that of physics-based models, while generating outputs consistent with physical laws.
Although we present and evaluate this methodology in the context of modeling the dynamics of temperature in lakes, it is applicable more widely to a range of scientific and engineering disciplines.
arXiv Detail & Related papers (2020-01-28T15:44:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.