Related papers: Decoding Partial Differential Equations: Cross-Modal Adaptation of Decoder-only Models to PDEs

Decoding Partial Differential Equations: Cross-Modal Adaptation of Decoder-only Models to PDEs

URL: http://arxiv.org/abs/2510.05278v1
Date: Mon, 06 Oct 2025 18:46:50 GMT
Title: Decoding Partial Differential Equations: Cross-Modal Adaptation of Decoder-only Models to PDEs
Authors: Paloma García-de-Herreros, Philipp Slusallek, Dietrich Klakow, Vagrant Gautam,
Abstract summary: We compare encoder-only and decoder-only models on cross-modal adaptation for time-dependent simulation tasks.<n>We find that decoder-only models are far worse than encoder-only models, when existing approaches are applied unmodified.<n>We introduce two novel approaches, Parallel Flipping and Sequence Doubling, attempting to mimic bidirectionality in autoregressive models.
Score: 27.331524018411926
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Large language models have shown great success on natural language tasks in recent years, but they have also shown great promise when adapted to new modalities, e.g., for scientific machine learning tasks. Even though decoder-only models are more popular within NLP and scale exceedingly well at generating natural language, most proposed approaches for cross-modal adaptation focus on encoder-only models, raising the question of how model architecture affects these approaches. In this paper, we therefore perform a series of ablation studies to answer this question, systematically comparing encoder-only and decoder-only models on cross-modal adaptation for time-dependent simulation tasks based on partial differential equations (PDEs). We find that decoder-only models are far worse than encoder-only models, when existing approaches are applied unmodified. In contrast to several other domains, scaling decoder-only models also does not help. To harness the potential of decoder-only models in this context, we introduce two novel approaches, Parallel Flipping and Sequence Doubling, attempting to mimic bidirectionality in autoregressive models. Both our methods improve overall performance using decoder-only models for all tasks and all cross-model adaptation methods, closing the gap to encoder-only model performance. We hope that our findings broaden the spectrum of models used on cross-modal adaptation tasks to further scientific ML.

Related papers

Model Inversion with Layer-Specific Modeling and Alignment for Data-Free Continual Learning [19.12792297140574]
Continual learning aims to incrementally train a model on a sequence of tasks while retaining performance on prior ones.<n> storing and replaying data is often infeasible due to privacy or security constraints.<n>We propose Per-layer Model Inversion (PMI), inspired by faster convergence in single-layer optimization.
arXiv Detail & Related papers (2025-10-30T09:58:48Z)
Every Step Counts: Decoding Trajectories as Authorship Fingerprints of dLLMs [63.82840470917859]
We show that the decoding mechanism of dLLMs can be used as a powerful tool for model attribution.<n>We propose a novel information extraction scheme called the Directed Decoding Map (DDM), which captures structural relationships between decoding steps and better reveals model-specific behaviors.
arXiv Detail & Related papers (2025-10-02T06:25:10Z)
Scaling Laws of Decoder-Only Models on the Multilingual Machine Translation Task [1.9107347888374506]
We study the scaling laws of decoder-only models on the multilingual and multidomain translation task. We show that the loss of decoder-only models can be estimated using a scaling law similar to the one discovered for large language models. We also show that scaling the depth and the width of a model lead to similar test loss improvements, but with different impact on the model's efficiency.
arXiv Detail & Related papers (2024-09-23T14:26:01Z)
Promises and Pitfalls of Generative Masked Language Modeling: Theoretical Framework and Practical Guidelines [74.42485647685272]
We focus on Generative Masked Language Models (GMLMs) We train a model to fit conditional probabilities of the data distribution via masking, which are subsequently used as inputs to a Markov Chain to draw samples from the model. We adapt the T5 model for iteratively-refined parallel decoding, achieving 2-3x speedup in machine translation with minimal sacrifice in quality.
arXiv Detail & Related papers (2024-07-22T18:00:00Z)
Exploiting the Potential of Seq2Seq Models as Robust Few-Shot Learners [8.43854206194162]
We show that seq2seq models can be highly effective few-shot learners for a wide spectrum of applications. We propose two methods to more effectively elicit in-context learning ability in seq2seq models: objective-aligned prompting and a fusion-based approach.
arXiv Detail & Related papers (2023-07-27T13:37:06Z)
eP-ALM: Efficient Perceptual Augmentation of Language Models [70.47962271121389]
We propose to direct effort to efficient adaptations of existing models, and propose to augment Language Models with perception. Existing approaches for adapting pretrained models for vision-language tasks still rely on several key components that hinder their efficiency. We show that by freezing more than 99% of total parameters, training only one linear projection layer, and prepending only one trainable token, our approach (dubbed eP-ALM) significantly outperforms other baselines on VQA and Captioning.
arXiv Detail & Related papers (2023-03-20T19:20:34Z)
Dataless Knowledge Fusion by Merging Weights of Language Models [47.432215933099016]
Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models.<n>This creates a barrier to fusing knowledge across individual models to yield a better single model.<n>We propose a dataless knowledge fusion method that merges models in their parameter space.
arXiv Detail & Related papers (2022-12-19T20:46:43Z)
Slimmable Domain Adaptation [112.19652651687402]
We introduce a simple framework, Slimmable Domain Adaptation, to improve cross-domain generalization with a weight-sharing model bank. Our framework surpasses other competing approaches by a very large margin on multiple benchmarks.
arXiv Detail & Related papers (2022-06-14T06:28:04Z)
Language Models are General-Purpose Interfaces [109.45478241369655]
We propose to use language models as a general-purpose interface to various foundation models. A collection of pretrained encoders perceive diverse modalities (such as vision, and language) We propose a semi-causal language modeling objective to jointly pretrain the interface and the modular encoders.
arXiv Detail & Related papers (2022-06-13T17:34:22Z)
What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization? [50.84738303888189]
We present a large-scale evaluation of modeling choices and their impact on zero-shot generalization. We train models with over 5 billion parameters for more than 170 billion tokens. We find that pretrained causal decoder models can be efficiently adapted into non-causal decoder models.
arXiv Detail & Related papers (2022-04-12T14:19:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.