Related papers: In-Context Operator Learning on the Space of Probability Measures

In-Context Operator Learning on the Space of Probability Measures

URL: http://arxiv.org/abs/2601.09979v1
Date: Thu, 15 Jan 2026 01:44:10 GMT
Title: In-Context Operator Learning on the Space of Probability Measures
Authors: Frank Cole, Dixi Wang, Yineng Chen, Yulong Lu, Rongjie Lai,
Abstract summary: We introduce emphin-context operator learning on probability measure spaces for optimal transport.<n>We parameterize the solution operator and develop scaling-law theory in two regimes.<n>Our numerical experiments on synthetic transports and generative-modeling benchmarks validate the framework.
Score: 11.178575236157961
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We introduce \emph{in-context operator learning on probability measure spaces} for optimal transport (OT). The goal is to learn a single solution operator that maps a pair of distributions to the OT map, using only few-shot samples from each distribution as a prompt and \emph{without} gradient updates at inference. We parameterize the solution operator and develop scaling-law theory in two regimes. In the \emph{nonparametric} setting, when tasks concentrate on a low-intrinsic-dimension manifold of source--target pairs, we establish generalization bounds that quantify how in-context accuracy scales with prompt size, intrinsic task dimension, and model capacity. In the \emph{parametric} setting (e.g., Gaussian families), we give an explicit architecture that recovers the exact OT map in context and provide finite-sample excess-risk bounds. Our numerical experiments on synthetic transports and generative-modeling benchmarks validate the framework.

Related papers

Riemannian Neural Optimal Transport [40.19067516813213]
Computational optimal transport (OT) offers a principled framework for generative modeling.<n>Neural OT methods, which use neural networks to learn an OT map from data in an amortized way, can be evaluated out of sample after training.<n>Existing approaches are tailored to Euclidean geometry.
arXiv Detail & Related papers (2026-02-03T14:09:35Z)
Length-Aware Adversarial Training for Variable-Length Trajectories: Digital Twins for Mall Shopper Paths [4.841565047500658]
We study generative modeling of emphvariable-length trajectories -- sequences of visited locations/items with associated timestamps.<n>Standard mini-batch training can be unstable when trajectory lengths are highly heterogeneous.<n>We propose bflength-aware sampling (LAS), a simple strategy that groups trajectories by length and samples batches from a single length bucket.
arXiv Detail & Related papers (2026-01-04T20:52:07Z)
Adaptive Symmetrization of the KL Divergence [10.632997610787207]
Many tasks in machine learning can be described as or reduced to learning a probability distribution given a finite set of samples.<n>A common approach is to minimize a statistical divergence between the (empirical) data distribution and a parameterized distribution, e.g., a normalizing flow (NF) or an energy-based model (EBM)
arXiv Detail & Related papers (2025-11-14T10:41:59Z)
Neural Local Wasserstein Regression [16.52489456261937]
We study the estimation problem of distribution-on-distribution regression, where both predictors and responses are probability measures.<n>Existing approaches typically rely on a global optimal transport map or tangent-space linearization.<n>We propose a flexible nonparametric framework that models regression through locally defined transport maps in Wasserstein space.
arXiv Detail & Related papers (2025-11-13T21:54:18Z)
In-Context Learning Is Provably Bayesian Inference: A Generalization Theory for Meta-Learning [51.56484100374058]
We introduce a principled risk decomposition that separates the total ICL risk into two components: Bayes Gap and Posterior Variance.<n>For a uniform-attention Transformer, we derive a non-asymptotic upper bound on this gap, which explicitly clarifies the dependence on the number of pretraining prompts.<n>The Posterior Variance is a model-independent risk representing the intrinsic task uncertainty.
arXiv Detail & Related papers (2025-10-13T03:42:31Z)
Test time training enhances in-context learning of nonlinear functions [51.56484100374058]
Test-time training (TTT) enhances model performance by explicitly updating designated parameters prior to each prediction.<n>We investigate the combination of TTT with in-context learning (ICL), where the model is given a few examples from the target distribution at inference time.
arXiv Detail & Related papers (2025-09-30T03:56:44Z)
MAP: Low-compute Model Merging with Amortized Pareto Fronts via Quadratic Approximation [80.47072100963017]
We introduce a novel and low-compute algorithm, Model Merging with Amortized Pareto Front (MAP)<n>MAP efficiently identifies a set of scaling coefficients for merging multiple models, reflecting the trade-offs involved.<n>We also introduce Bayesian MAP for scenarios with a relatively low number of tasks and Nested MAP for situations with a high number of tasks, further reducing the computational cost of evaluation.
arXiv Detail & Related papers (2024-06-11T17:55:25Z)
Energy-Guided Continuous Entropic Barycenter Estimation for General Costs [95.33926437521046]
We propose a novel algorithm for approximating the continuous Entropic OT (EOT) barycenter for arbitrary OT cost functions.<n>Our approach is built upon the dual reformulation of the EOT problem based on weak OT.
arXiv Detail & Related papers (2023-10-02T11:24:36Z)
ProbVLM: Probabilistic Adapter for Frozen Vision-Language Models [69.50316788263433]
We propose ProbVLM, a probabilistic adapter that estimates probability distributions for the embeddings of pre-trained vision-language models. We quantify the calibration of embedding uncertainties in retrieval tasks and show that ProbVLM outperforms other methods. We present a novel technique for visualizing the embedding distributions using a large-scale pre-trained latent diffusion model.
arXiv Detail & Related papers (2023-07-01T18:16:06Z)
Asynchronously Trained Distributed Topographic Maps [0.0]
We present an algorithm that uses $N$ autonomous units to generate a feature map by distributed training. Unit autonomy is achieved by sparse interaction in time & space through the combination of a distributed search, and a cascade-driven weight updating scheme.
arXiv Detail & Related papers (2023-01-20T01:15:56Z)
Multi-Task Learning for Sparsity Pattern Heterogeneity: Statistical and Computational Perspectives [10.514866749547558]
We consider a problem in Multi-Task Learning (MTL) where multiple linear models are jointly trained on a collection of datasets. A key novelty of our framework is that it allows the sparsity pattern of regression coefficients and the values of non-zero coefficients to differ across tasks. Our methods encourage models to share information across tasks through separately encouraging 1) coefficient supports, and/or 2) nonzero coefficient values to be similar. This allows models to borrow strength during variable selection even when non-zero coefficient values differ across tasks.
arXiv Detail & Related papers (2022-12-16T19:52:25Z)
Distributionally Robust Models with Parametric Likelihood Ratios [123.05074253513935]
Three simple ideas allow us to train models with DRO using a broader class of parametric likelihood ratios. We find that models trained with the resulting parametric adversaries are consistently more robust to subpopulation shifts when compared to other DRO approaches.
arXiv Detail & Related papers (2022-04-13T12:43:12Z)
Learning Augmentation Distributions using Transformed Risk Minimization [47.236227685707526]
We propose a new emphTransformed Risk Minimization (TRM) framework as an extension of classical risk minimization. As a key application, we focus on learning augmentations to improve classification performance with a given class of predictors.
arXiv Detail & Related papers (2021-11-16T02:07:20Z)
Making Affine Correspondences Work in Camera Geometry Computation [62.7633180470428]
Local features provide region-to-region rather than point-to-point correspondences. We propose guidelines for effective use of region-to-region matches in the course of a full model estimation pipeline. Experiments show that affine solvers can achieve accuracy comparable to point-based solvers at faster run-times.
arXiv Detail & Related papers (2020-07-20T12:07:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.