Adapting Vision-Language Models for Neutrino Event Classification in High-Energy Physics
- URL: http://arxiv.org/abs/2509.08461v2
- Date: Thu, 11 Sep 2025 13:03:04 GMT
- Title: Adapting Vision-Language Models for Neutrino Event Classification in High-Energy Physics
- Authors: Dikshant Sagar, Kaiwen Yu, Alejandro Yankelevich, Jianming Bian, Pierre Baldi,
- Abstract summary: This work explores the applications of Vision Language Models (VLMs) to the task of identifying neutrino interactions in pixelated detector data from high-energy physics experiments.<n>We benchmark this model against a state-of-the-art convolutional neural network (CNN) architecture, similar to those used in the NOvA and DUNE experiments.<n>We find that VLMs can outperform CNNs, while also providing greater flexibility in integrating auxiliary textual or semantic information and offering more interpretable, reasoning-based predictions.
- Score: 41.33501105382656
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advances in Large Language Models (LLMs) have demonstrated their remarkable capacity to process and reason over structured and unstructured data modalities beyond natural language. In this work, we explore the applications of Vision Language Models (VLMs), specifically a fine-tuned variant of LLaMa 3.2, to the task of identifying neutrino interactions in pixelated detector data from high-energy physics (HEP) experiments. We benchmark this model against a state-of-the-art convolutional neural network (CNN) architecture, similar to those used in the NOvA and DUNE experiments, which have achieved high efficiency and purity in classifying electron and muon neutrino events. Our evaluation considers both the classification performance and interpretability of the model predictions. We find that VLMs can outperform CNNs, while also providing greater flexibility in integrating auxiliary textual or semantic information and offering more interpretable, reasoning-based predictions. This work highlights the potential of VLMs as a general-purpose backbone for physics event classification, due to their high performance, interpretability, and generalizability, which opens new avenues for integrating multimodal reasoning in experimental neutrino physics.
Related papers
- Beyond Language Modeling: An Exploration of Multimodal Pretraining [125.34714978184638]
We provide empirical clarity through controlled, from-scratch pretraining experiments.<n>We adopt the Transfusion framework, using next-token prediction for language and diffusion for vision.<n>We demonstrate that the MoE architecture harmonizes this scaling asymmetry by providing the high model capacity required by language.
arXiv Detail & Related papers (2026-03-03T18:58:00Z) - Innovator-VL: A Multimodal Large Language Model for Scientific Discovery [84.15264653078826]
We present Innovator-VL, a scientific multimodal large language model designed to advance understanding and reasoning across diverse scientific domains.<n>We show that principled training design and transparent methodology can yield strong scientific intelligence with substantially reduced data requirements.
arXiv Detail & Related papers (2026-01-27T08:12:18Z) - Fine-Tuning Vision-Language Models for Neutrino Event Analysis in High-Energy Physics Experiments [41.33501105382656]
Vision-Language Model (VLM) for classifying neutrino interactions from pixelated detector images in high-energy physics experiments.<n>We benchmark its performance against an established CNN baseline used in experiments like NOvA and DUNE, evaluating metrics such as classification accuracy, precision, recall, and AUC-ROC.<n>Our results show that the VLM not only matches or exceeds CNN performance but also enables richer reasoning and better integration of auxiliary textual or semantic context.
arXiv Detail & Related papers (2025-08-26T19:12:28Z) - Do Vision-Language Models Have Internal World Models? Towards an Atomic Evaluation [54.3628937181904]
Internal world models (WMs) enable agents to understand the world's state and predict transitions.<n>Recent large Vision-Language Models (VLMs), such as OpenAI o3, GPT-4o and Gemini, exhibit potential as general-purpose WMs.
arXiv Detail & Related papers (2025-06-27T03:24:29Z) - Do We Really Need GNNs with Explicit Structural Modeling? MLPs Suffice for Language Model Representations [50.45261187796993]
Graph Neural Networks (GNNs) fail to fully utilize structural information, whereas Multi-Layer Perceptrons (MLPs) exhibit a surprising ability in structure-aware tasks.<n>This paper introduces a comprehensive probing framework from an information-theoretic perspective.
arXiv Detail & Related papers (2025-06-26T18:10:28Z) - Leveraging neural network interatomic potentials for a foundation model of chemistry [2.66269503676104]
HackNIP is a two-stage pipeline that leverages pretrained neural network interatomic potentials.<n>It first extracts fixed-length feature vectors from NIP foundation models and then uses these embeddings to train shallow ML models.<n>This study investigates whether such a hybridization approach, by hacking" the NIP, can outperform end-to-end deep neural networks.
arXiv Detail & Related papers (2025-06-23T10:49:19Z) - Neural ODE Transformers: Analyzing Internal Dynamics and Adaptive Fine-tuning [30.781578037476347]
We introduce a novel approach to modeling transformer architectures using highly flexible non-autonomous neural ordinary differential equations (ODEs)<n>Our proposed model parameterizes all weights of attention and feed-forward blocks through neural networks, expressing these weights as functions of a continuous layer index.<n>Our neural ODE transformer demonstrates performance comparable to or better than vanilla transformers across various configurations and datasets.
arXiv Detail & Related papers (2025-03-03T09:12:14Z) - LMDA-Net:A lightweight multi-dimensional attention network for general
EEG-based brain-computer interface paradigms and interpretability [2.3945862743903916]
We propose a novel lightweight multi-dimensional attention network, called LMDA-Net.
By incorporating two novel attention modules designed specifically for EEG signals, LMDA-Net can effectively integrate features from multiple dimensions.
LMDA-Net outperforms other representative methods in terms of classification accuracy and predicting volatility.
arXiv Detail & Related papers (2023-03-29T02:35:02Z) - A transfer learning enhanced the physics-informed neural network model
for vortex-induced vibration [0.0]
This paper proposed a transfer learning enhanced the physics-informed neural network (PINN) model to study the VIV (2D)
The physics-informed neural network, when used in conjunction with the transfer learning method, enhances learning efficiency and keeps predictability in the target task by common characteristics knowledge from the source model without requiring a huge quantity of datasets.
arXiv Detail & Related papers (2021-12-29T08:20:23Z) - Physics-Integrated Variational Autoencoders for Robust and Interpretable
Generative Modeling [86.9726984929758]
We focus on the integration of incomplete physics models into deep generative models.
We propose a VAE architecture in which a part of the latent space is grounded by physics.
We demonstrate generative performance improvements over a set of synthetic and real-world datasets.
arXiv Detail & Related papers (2021-02-25T20:28:52Z) - Phase Detection with Neural Networks: Interpreting the Black Box [58.720142291102135]
Neural networks (NNs) usually hinder any insight into the reasoning behind their predictions.
We demonstrate how influence functions can unravel the black box of NN when trained to predict the phases of the one-dimensional extended spinless Fermi-Hubbard model at half-filling.
arXiv Detail & Related papers (2020-04-09T17:45:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.