Related papers: Amortized Spectral Kernel Discovery via Prior-Data Fitted Network

Amortized Spectral Kernel Discovery via Prior-Data Fitted Network

URL: http://arxiv.org/abs/2601.21731v1
Date: Thu, 29 Jan 2026 13:51:26 GMT
Title: Amortized Spectral Kernel Discovery via Prior-Data Fitted Network
Authors: Kaustubh Sharma, Srijan Tiwari, Ojasva Nema, Parikshit Pareek,
Abstract summary: We introduce an interpretability-driven framework for amortized spectral discovery from pre-trained PFNs with decoupled attention.<n>We propose decoder architectures that map PFN latents to explicit spectral density estimates and corresponding stationary kernels.<n>This yields orders-of-magnitude reductions in inference time compared to optimization-based baselines.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Prior-Data Fitted Networks (PFNs) enable efficient amortized inference but lack transparent access to their learned priors and kernels. This opacity hinders their use in downstream tasks, such as surrogate-based optimization, that require explicit covariance models. We introduce an interpretability-driven framework for amortized spectral discovery from pre-trained PFNs with decoupled attention. We perform a mechanistic analysis on a trained PFN that identifies attention latent output as the key intermediary, linking observed function data to spectral structure. Building on this insight, we propose decoder architectures that map PFN latents to explicit spectral density estimates and corresponding stationary kernels via Bochner's theorem. We study this pipeline in both single-realization and multi-realization regimes, contextualizing theoretical limits on spectral identifiability and proving consistency when multiple function samples are available. Empirically, the proposed decoders recover complex multi-peak spectral mixtures and produce explicit kernels that support Gaussian process regression with accuracy comparable to PFNs and optimization-based baselines, while requiring only a single forward pass. This yields orders-of-magnitude reductions in inference time compared to optimization-based baselines.

Related papers

Compact Circulant Layers with Spectral Priors [1.5755923640031846]
Critical applications in areas such as medicine, robotics and autonomous systems require compact (i.e., memory efficient) neural networks.<n>We study compact spectral circulant and block-circulant-with-circulant-blocks (BCCB) layers.
arXiv Detail & Related papers (2026-02-25T14:48:25Z)
Spectral Gating Networks [65.9496901693099]
We introduce Spectral Gating Networks (SGN) to introduce frequency-rich expressivity in feed-forward networks.<n>SGN augments a standard activation pathway with a compact spectral pathway and learnable gates that allow the model to start from a stable base behavior.<n>It consistently improves accuracy-efficiency trade-offs under comparable computational budgets.
arXiv Detail & Related papers (2026-02-07T20:00:49Z)
The Procrustean Bed of Time Series: The Optimization Bias of Point-wise Loss [53.542743390809356]
This paper aims to provide a first-principles analysis of the Expectation of Optimization Bias (EOB)<n>Our analysis reveals a fundamental paradigm paradox: the more deterministic and structured the time series, the more severe the bias by point-wise loss function.<n>We present a concrete solution that simultaneously achieves both principles via DFT or DWT.
arXiv Detail & Related papers (2025-12-21T06:08:22Z)
Spectral Algorithms under Covariate Shift [4.349399061959293]
Spectral algorithms leverage spectral regularization techniques to analyze and process data.<n>We investigate the convergence behavior of spectral algorithms in situations where the distributions of training and test data may differ.<n>We propose a novel weighted spectral algorithm with normalized weights that incorporates density ratio information into the learning process.
arXiv Detail & Related papers (2025-04-17T04:02:06Z)
Nonparametric Bellman Mappings for Value Iteration in Distributed Reinforcement Learning [8.324857108715007]
This paper introduces novel Bellman mappings (B-Maps) for value iteration (VI) in distributed reinforcement learning (DRL)<n>Each agent constructs a nonparametric B-Map from its private data, operating on Q-functions represented in a reproducing kernel Hilbert space.<n>A detailed performance analysis shows that the proposed DRL framework effectively approximates the performance of a centralized node.
arXiv Detail & Related papers (2025-03-20T14:39:21Z)
Variance-Reducing Couplings for Random Features [57.73648780299374]
Random features (RFs) are a popular technique to scale up kernel methods in machine learning. We find couplings to improve RFs defined on both Euclidean and discrete input spaces. We reach surprising conclusions about the benefits and limitations of variance reduction as a paradigm.
arXiv Detail & Related papers (2024-05-26T12:25:09Z)
Rethinking Clustered Federated Learning in NOMA Enhanced Wireless Networks [60.09912912343705]
This study explores the benefits of integrating the novel clustered federated learning (CFL) approach with non-independent and identically distributed (non-IID) datasets. A detailed theoretical analysis of the generalization gap that measures the degree of non-IID in the data distribution is presented. Solutions to address the challenges posed by non-IID conditions are proposed with the analysis of the properties.
arXiv Detail & Related papers (2024-03-05T17:49:09Z)
Distributed Markov Chain Monte Carlo Sampling based on the Alternating Direction Method of Multipliers [143.6249073384419]
In this paper, we propose a distributed sampling scheme based on the alternating direction method of multipliers. We provide both theoretical guarantees of our algorithm's convergence and experimental evidence of its superiority to the state-of-the-art. In simulation, we deploy our algorithm on linear and logistic regression tasks and illustrate its fast convergence compared to existing gradient-based methods.
arXiv Detail & Related papers (2024-01-29T02:08:40Z)
Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training. We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z)
Learning Neural Eigenfunctions for Unsupervised Semantic Segmentation [12.91586050451152]
Spectral clustering is a theoretically grounded solution to it where the spectral embeddings for pixels are computed to construct distinct clusters. Current approaches still suffer from inefficiencies in spectral decomposition and inflexibility in applying them to the test data. This work addresses these issues by casting spectral clustering as a parametric approach that employs neural network-based eigenfunctions to produce spectral embeddings. In practice, the neural eigenfunctions are lightweight and take the features from pre-trained models as inputs, improving training efficiency and unleashing the potential of pre-trained models for dense prediction.
arXiv Detail & Related papers (2023-04-06T03:14:15Z)
Spectral Decomposition Representation for Reinforcement Learning [100.0424588013549]
We propose an alternative spectral method, Spectral Decomposition Representation (SPEDER), that extracts a state-action abstraction from the dynamics without inducing spurious dependence on the data collection policy. A theoretical analysis establishes the sample efficiency of the proposed algorithm in both the online and offline settings. An experimental investigation demonstrates superior performance over current state-of-the-art algorithms across several benchmarks.
arXiv Detail & Related papers (2022-08-19T19:01:30Z)
Convolutional Spectral Kernel Learning [21.595130250234646]
We build an interpretable convolutional spectral kernel network (textttCSKN) based on the inverse Fourier transform. We derive the generalization error bounds and introduce two regularizers to improve the performance. Experiments results on real-world datasets validate the effectiveness of the learning framework.
arXiv Detail & Related papers (2020-02-28T14:35:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.