TokaMind: A Multi-Modal Transformer Foundation Model for Tokamak Plasma Dynamics
- URL: http://arxiv.org/abs/2602.15084v1
- Date: Mon, 16 Feb 2026 12:26:07 GMT
- Title: TokaMind: A Multi-Modal Transformer Foundation Model for Tokamak Plasma Dynamics
- Authors: Tobia Boschi, Andrea Loreti, Nicola C. Amorisco, Rodrigo H. Ordonez-Hurtado, Cécile Rousseau, George K. Holt, Eszter Székely, Alexander Whittle, Samuel Jackson, Adriano Agnello, Stanislas Pamela, Alessandra Pascale, Robert Akers, Juan Bernabe Moreno, Vassil Alexandrov, Mykhaylo Zayats,
- Abstract summary: TokaMind is an open-source foundation model framework for fusion plasma modeling.<n>It is trained on heterogeneous tokamak diagnostics from the publicly available MAST dataset.<n>We evaluate TokaMind on the recently introduced MAST benchmark TokaMark.
- Score: 56.073642366268764
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present TokaMind, an open-source foundation model framework for fusion plasma modeling, based on a Multi-Modal Transformer (MMT) and trained on heterogeneous tokamak diagnostics from the publicly available MAST dataset. TokaMind supports multiple data modalities (time-series, 2D profiles, and videos) with different sampling rates, robust missing-signal handling, and efficient task adaptation via selectively loading and freezing four model components. To represent multi-modal signals, we use a training-free Discrete Cosine Transform embedding (DCT3D) and provide a clean interface for alternative embeddings (e.g., Variational Autoencoders - VAEs). We evaluate TokaMind on the recently introduced MAST benchmark TokaMark, comparing training and embedding strategies. Our results show that fine-tuned TokaMind outperforms the benchmark baseline on all but one task, and that, for several tasks, lightweight fine-tuning yields better performance than training the same architecture from scratch under a matched epoch budget. These findings highlight the benefits of multi-modal pretraining for tokamak plasma dynamics and provide a practical, extensible foundation for future fusion modeling tasks. Training code and model weights will be made publicly available.
Related papers
- MAFM^3: Modular Adaptation of Foundation Models for Multi-Modal Medical AI [3.1920084309415007]
We propose MAFM3, a framework that enables a single foundation model to expand into diverse domains, tasks, and modalities.<n>Unlike conventional adaptation methods that treat each new task or modality in isolation, MAFM3 provides a unified and expandable framework for efficient multitask and multimodality adaptation.
arXiv Detail & Related papers (2025-11-14T12:10:59Z) - TSGym: Design Choices for Deep Multivariate Time-Series Forecasting [38.12202305030755]
This work bridges gaps by decomposing deep MTSF methods into their core, fine-grained components.<n>We propose a novel automated solution called TSGym for MTSF tasks.<n>Extensive experiments indicate that TSGym significantly outperforms existing state-of-the-art MTSF and AutoML methods.
arXiv Detail & Related papers (2025-09-21T12:49:31Z) - Sample-efficient Integration of New Modalities into Large Language Models [48.81776019848246]
Multimodal foundation models can process several modalities.<n>We introduce a method for sample-efficient modality integration into Large Language Models.<n>We find that SEMI achieves a significant boost in sample efficiency during few-shot integration of new modalities.
arXiv Detail & Related papers (2025-09-04T18:41:59Z) - OptMerge: Unifying Multimodal LLM Capabilities and Modalities via Model Merging [124.91183814854126]
Model merging seeks to combine multiple expert models into a single model.<n>We introduce a benchmark for model merging research that clearly divides the tasks for MLLM training and evaluation.<n>We find that model merging offers a promising way for building improved MLLMs without requiring training data.
arXiv Detail & Related papers (2025-05-26T12:23:14Z) - EMR-Merging: Tuning-Free High-Performance Model Merging [55.03509900949149]
We show that Elect, Mask & Rescale-Merging (EMR-Merging) shows outstanding performance compared to existing merging methods.
EMR-Merging is tuning-free, thus requiring no data availability or any additional training while showing impressive performance.
arXiv Detail & Related papers (2024-05-23T05:25:45Z) - Multimodal Fusion with Pre-Trained Model Features in Affective Behaviour Analysis In-the-wild [37.32217405723552]
We present an approach for addressing the task of Expression (Expr) Recognition and Valence-Arousal (VA) Estimation.
We evaluate the Aff-Wild2 database using pre-trained models, then extract the final hidden layers of the models as features.
Following preprocessing and or convolution to align the extracted features, different models are employed for modal fusion.
arXiv Detail & Related papers (2024-03-22T09:00:24Z) - FissionFusion: Fast Geometric Generation and Hierarchical Souping for Medical Image Analysis [0.7751705157998379]
The scarcity of well-annotated medical datasets requires leveraging transfer learning from broader datasets like ImageNet or pre-trained models like CLIP.
Model soups averages multiple fine-tuned models aiming to improve performance on In-Domain (ID) tasks and enhance robustness against Out-of-Distribution (OOD) datasets.
We propose a hierarchical merging approach that involves local and global aggregation of models at various levels.
arXiv Detail & Related papers (2024-03-20T06:48:48Z) - MatFormer: Nested Transformer for Elastic Inference [91.45687988953435]
MatFormer is a novel Transformer architecture designed to provide elastic inference across diverse deployment constraints.<n>MatFormer achieves this by incorporating a nested Feed Forward Network (FFN) block structure within a standard Transformer model.<n>We show that a 850M decoder-only MatFormer language model (MatLM) allows us to extract multiple smaller models spanning from 582M to 850M parameters.
arXiv Detail & Related papers (2023-10-11T17:57:14Z) - An Empirical Study of Multimodal Model Merging [148.48412442848795]
Model merging is a technique that fuses multiple models trained on different tasks to generate a multi-task solution.
We conduct our study for a novel goal where we can merge vision, language, and cross-modal transformers of a modality-specific architecture.
We propose two metrics that assess the distance between weights to be merged and can serve as an indicator of the merging outcomes.
arXiv Detail & Related papers (2023-04-28T15:43:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.