Interpretable Tabular Foundation Models via In-Context Kernel Regression
- URL: http://arxiv.org/abs/2602.02162v1
- Date: Mon, 02 Feb 2026 14:37:10 GMT
- Title: Interpretable Tabular Foundation Models via In-Context Kernel Regression
- Authors: Ratmir Miftachov, Bruno Charron, Simon Valentin,
- Abstract summary: KernelICL is a framework to enhance tabular foundation models with quantifiable sample-based interpretability.<n>We make in-context learning akin to kernel regression so that every prediction is a transparent weighted average of training labels.<n>On 55 TALENT benchmark datasets, KernelICL achieves performance on par with existing tabular foundation models.
- Score: 0.9285295512807727
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Tabular foundation models like TabPFN and TabICL achieve state-of-the-art performance through in-context learning, yet their architectures remain fundamentally opaque. We introduce KernelICL, a framework to enhance tabular foundation models with quantifiable sample-based interpretability. Building on the insight that in-context learning is akin to kernel regression, we make this mechanism explicit by replacing the final prediction layer with kernel functions (Gaussian, dot-product, kNN) so that every prediction is a transparent weighted average of training labels. We introduce a two-dimensional taxonomy that formally unifies standard kernel methods, modern neighbor-based approaches, and attention mechanisms under a single framework, and quantify inspectability via the perplexity of the weight distribution over training samples. On 55 TALENT benchmark datasets, KernelICL achieves performance on par with existing tabular foundation models, demonstrating that explicit kernel constraints on the final layer enable inspectable predictions without sacrificing performance.
Related papers
- Cluster-Based Generalized Additive Models Informed by Random Fourier Features [19.409397281817288]
This work introduces a mixture of generalized additive models (GAMs) in which random Fourier feature (RFF) representations are leveraged to uncover locally adaptive structure in the data.<n> Numerical experiments on real-world regression benchmarks, including the California Housing, NASA Air Self-Noise, and Bike Sharing datasets, demonstrate improved predictive performance.
arXiv Detail & Related papers (2025-12-22T13:15:52Z) - Interpretable Kernel Representation Learning at Scale: A Unified Framework Utilizing Nyström Approximation [6.209111342262837]
We introduce KREPES -- a unified framework for kernel-based representation learning via Nystr"om approximation.<n>KREPES accommodates a wide range of unsupervised and self-supervised losses.<n>It enables principled interpretability of the learned representations, an immediate benefit over deep models.
arXiv Detail & Related papers (2025-09-29T08:45:40Z) - Deep Hierarchical Learning with Nested Subspace Networks [53.71337604556311]
We propose Nested Subspace Networks (NSNs) for large neural networks.<n>NSNs enable a single model to be dynamically and granularly adjusted across a continuous spectrum of compute budgets.<n>We show that NSNs can be surgically applied to pre-trained LLMs and unlock a smooth and predictable compute-performance frontier.
arXiv Detail & Related papers (2025-09-22T15:13:14Z) - Initialization Schemes for Kolmogorov-Arnold Networks: An Empirical Study [9.450853542720909]
Kolmogorov-Arnold Networks (KANs) are a recently introduced neural architecture that replace fixed nonlinearities with trainable activation functions.<n>This work proposes two theory-driven approaches inspired by LeCun and Glorot, as well as an empirical power-law family with tunable exponents.
arXiv Detail & Related papers (2025-09-03T15:45:28Z) - High-Performance Few-Shot Segmentation with Foundation Models: An Empirical Study [64.06777376676513]
We develop a few-shot segmentation (FSS) framework based on foundation models.
To be specific, we propose a simple approach to extract implicit knowledge from foundation models to construct coarse correspondence.
Experiments on two widely used datasets demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2024-09-10T08:04:11Z) - Bayesian Exploration of Pre-trained Models for Low-shot Image Classification [14.211305168954594]
This work proposes a simple and effective probabilistic model ensemble framework based on Gaussian processes.
We achieve the integration of prior knowledge by specifying the mean function with CLIP and the kernel function.
We demonstrate that our method consistently outperforms competitive ensemble baselines regarding predictive performance.
arXiv Detail & Related papers (2024-03-30T10:25:28Z) - Unifying Self-Supervised Clustering and Energy-Based Models [9.3176264568834]
We establish a principled connection between self-supervised learning and generative models.<n>We show that our solution can be integrated into a neuro-symbolic framework to tackle a simple yet non-trivial instantiation of the symbol grounding problem.
arXiv Detail & Related papers (2023-12-30T04:46:16Z) - Universal Domain Adaptation from Foundation Models: A Baseline Study [58.51162198585434]
We make empirical studies of state-of-the-art UniDA methods using foundation models.
We introduce textitCLIP distillation, a parameter-free method specifically designed to distill target knowledge from CLIP models.
Although simple, our method outperforms previous approaches in most benchmark tasks.
arXiv Detail & Related papers (2023-05-18T16:28:29Z) - Tell Model Where to Attend: Improving Interpretability of Aspect-Based
Sentiment Classification via Small Explanation Annotations [23.05672636220897]
We propose an textbfInterpretation-textbfEnhanced textbfGradient-based framework for textbfABSC via a small number of explanation annotations, namely textttIEGA.
Our model is model agnostic and task agnostic so that it can be integrated into the existing ABSC methods or other tasks.
arXiv Detail & Related papers (2023-02-21T06:55:08Z) - Learning by Distillation: A Self-Supervised Learning Framework for
Optical Flow Estimation [71.76008290101214]
DistillFlow is a knowledge distillation approach to learning optical flow.
It achieves state-of-the-art unsupervised learning performance on both KITTI and Sintel datasets.
Our models ranked 1st among all monocular methods on the KITTI 2015 benchmark, and outperform all published methods on the Sintel Final benchmark.
arXiv Detail & Related papers (2021-06-08T09:13:34Z) - Coresets via Bilevel Optimization for Continual Learning and Streaming [86.67190358712064]
We propose a novel coreset construction via cardinality-constrained bilevel optimization.
We show how our framework can efficiently generate coresets for deep neural networks, and demonstrate its empirical benefits in continual learning and in streaming settings.
arXiv Detail & Related papers (2020-06-06T14:20:25Z) - Target-Embedding Autoencoders for Supervised Representation Learning [111.07204912245841]
This paper analyzes a framework for improving generalization in a purely supervised setting, where the target space is high-dimensional.
We motivate and formalize the general framework of target-embedding autoencoders (TEA) for supervised prediction, learning intermediate latent representations jointly optimized to be both predictable from features as well as predictive of targets.
arXiv Detail & Related papers (2020-01-23T02:37:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.