OmniJet-$α$: The first cross-task foundation model for particle physics
- URL: http://arxiv.org/abs/2403.05618v2
- Date: Sat, 7 Sep 2024 11:17:18 GMT
- Title: OmniJet-$α$: The first cross-task foundation model for particle physics
- Authors: Joschka Birk, Anna Hallin, Gregor Kasieczka,
- Abstract summary: Foundation models are multi-dataset and multi-task machine learning methods that once pre-trained can be fine-tuned for a variety of downstream applications.
We report significant progress on this challenge on several fronts.
We demonstrate transfer learning between an unsupervised problem (jet generation) and a classic supervised task (jet tagging) with our new OmniJet-$alpha$ model.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Foundation models are multi-dataset and multi-task machine learning methods that once pre-trained can be fine-tuned for a large variety of downstream applications. The successful development of such general-purpose models for physics data would be a major breakthrough as they could improve the achievable physics performance while at the same time drastically reduce the required amount of training time and data. We report significant progress on this challenge on several fronts. First, a comprehensive set of evaluation methods is introduced to judge the quality of an encoding from physics data into a representation suitable for the autoregressive generation of particle jets with transformer architectures (the common backbone of foundation models). These measures motivate the choice of a higher-fidelity tokenization compared to previous works. Finally, we demonstrate transfer learning between an unsupervised problem (jet generation) and a classic supervised task (jet tagging) with our new OmniJet-$\alpha$ model. This is the first successful transfer between two different and actively studied classes of tasks and constitutes a major step in the building of foundation models for particle physics.
Related papers
- Open-Source High-Speed Flight Surrogate Modeling Framework [0.0]
High-speed flight vehicles, which travel much faster than the speed of sound, are crucial for national defense and space exploration.
accurately predicting their behavior under numerous, varied flight conditions is a challenge and often expensive.
The proposed approach involves creating smarter, more efficient machine learning models.
arXiv Detail & Related papers (2024-11-06T01:34:06Z) - Is Tokenization Needed for Masked Particle Modelling? [8.79008927474707]
Masked particle modeling (MPM) is a self-supervised learning scheme for constructing expressive representations of unordered sets.
We improve MPM by addressing inefficiencies in the implementation and incorporating a more powerful decoder.
We show that these new methods outperform the tokenized learning objective from the original MPM on a new test bed for foundation models for jets.
arXiv Detail & Related papers (2024-09-19T09:12:29Z) - MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining [73.81862342673894]
Foundation models have reshaped the landscape of Remote Sensing (RS) by enhancing various image interpretation tasks.
transferring the pretrained models to downstream tasks may encounter task discrepancy due to their formulation of pretraining as image classification or object discrimination tasks.
We conduct multi-task supervised pretraining on the SAMRS dataset, encompassing semantic segmentation, instance segmentation, and rotated object detection.
Our models are finetuned on various RS downstream tasks, such as scene classification, horizontal and rotated object detection, semantic segmentation, and change detection.
arXiv Detail & Related papers (2024-03-20T09:17:22Z) - Diffusion-Based Neural Network Weights Generation [80.89706112736353]
D2NWG is a diffusion-based neural network weights generation technique that efficiently produces high-performing weights for transfer learning.
Our method extends generative hyper-representation learning to recast the latent diffusion paradigm for neural network weights generation.
Our approach is scalable to large architectures such as large language models (LLMs), overcoming the limitations of current parameter generation techniques.
arXiv Detail & Related papers (2024-02-28T08:34:23Z) - Masked Particle Modeling on Sets: Towards Self-Supervised High Energy Physics Foundation Models [4.299997052226609]
Masked particle modeling (MPM) is a self-supervised method for learning generic, transferable, and reusable representations on unordered sets of inputs.
We study the efficacy of the method in samples of high energy jets at collider physics experiments.
arXiv Detail & Related papers (2024-01-24T15:46:32Z) - Towards Foundation Models for Scientific Machine Learning:
Characterizing Scaling and Transfer Behavior [32.74388989649232]
We study how pre-training could be used for scientific machine learning (SciML) applications.
We find that fine-tuning these models yields more performance gains as model size increases.
arXiv Detail & Related papers (2023-06-01T00:32:59Z) - Towards Efficient Task-Driven Model Reprogramming with Foundation Models [52.411508216448716]
Vision foundation models exhibit impressive power, benefiting from the extremely large model capacity and broad training data.
However, in practice, downstream scenarios may only support a small model due to the limited computational resources or efficiency considerations.
This brings a critical challenge for the real-world application of foundation models: one has to transfer the knowledge of a foundation model to the downstream task.
arXiv Detail & Related papers (2023-04-05T07:28:33Z) - Self-Distillation for Further Pre-training of Transformers [83.84227016847096]
We propose self-distillation as a regularization for a further pre-training stage.
We empirically validate the efficacy of self-distillation on a variety of benchmark datasets for image and text classification tasks.
arXiv Detail & Related papers (2022-09-30T02:25:12Z) - Which priors matter? Benchmarking models for learning latent dynamics [70.88999063639146]
Several methods have proposed to integrate priors from classical mechanics into machine learning models.
We take a sober look at the current capabilities of these models.
We find that the use of continuous and time-reversible dynamics benefits models of all classes.
arXiv Detail & Related papers (2021-11-09T23:48:21Z) - Physics-Integrated Variational Autoencoders for Robust and Interpretable
Generative Modeling [86.9726984929758]
We focus on the integration of incomplete physics models into deep generative models.
We propose a VAE architecture in which a part of the latent space is grounded by physics.
We demonstrate generative performance improvements over a set of synthetic and real-world datasets.
arXiv Detail & Related papers (2021-02-25T20:28:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.