Related papers: Implicit Models: Expressive Power Scales with Test-Time Compute

Implicit Models: Expressive Power Scales with Test-Time Compute

URL: http://arxiv.org/abs/2510.03638v1
Date: Sat, 04 Oct 2025 02:49:22 GMT
Title: Implicit Models: Expressive Power Scales with Test-Time Compute
Authors: Jialin Liu, Lisang Ding, Stanley Osher, Wotao Yin,
Abstract summary: Implicit models, an emerging model class, compute outputs by iterating a single parameter block to a fixed point.<n>We study this gap through a nonparametric analysis of expressive power.<n>We prove that for a broad class of implicit models, this process lets the model's expressive power scale with test-time compute.
Score: 17.808479563949074
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Implicit models, an emerging model class, compute outputs by iterating a single parameter block to a fixed point. This architecture realizes an infinite-depth, weight-tied network that trains with constant memory, significantly reducing memory needs for the same level of performance compared to explicit models. While it is empirically known that these compact models can often match or even exceed larger explicit networks by allocating more test-time compute, the underlying mechanism remains poorly understood. We study this gap through a nonparametric analysis of expressive power. We provide a strict mathematical characterization, showing that a simple and regular implicit operator can, through iteration, progressively express more complex mappings. We prove that for a broad class of implicit models, this process lets the model's expressive power scale with test-time compute, ultimately matching a much richer function class. The theory is validated across three domains: image reconstruction, scientific computing, and operations research, demonstrating that as test-time iterations increase, the complexity of the learned mapping rises, while the solution quality simultaneously improves and stabilizes.

Related papers

UniT: Unified Multimodal Chain-of-Thought Test-time Scaling [85.590774707406]
Unified models can handle both multimodal understanding and generation within a single architecture, yet they typically operate in a single pass without iteratively refining their outputs.<n>We introduce UniT, a framework for multimodal test-time scaling that enables a single unified model to reason, verify, and refine across multiple rounds.
arXiv Detail & Related papers (2026-02-12T18:59:49Z)
Why Can't Transformers Learn Multiplication? Reverse-Engineering Reveals Long-Range Dependency Pitfalls [54.57326125204404]
Language models are increasingly capable, yet still fail at a seemingly simple task of multi-digit multiplication.<n>We study why, by reverse-engineering a model that successfully learns multiplication via emphimplicit chain-of-thought'
arXiv Detail & Related papers (2025-09-30T19:03:26Z)
DiffuMatch: Category-Agnostic Spectral Diffusion Priors for Robust Non-rigid Shape Matching [53.39693288324375]
We show that both in-network regularization and functional map training can be replaced with data-driven methods.<n>We first train a generative model of functional maps in the spectral domain using score-based generative modeling.<n>We then exploit the resulting model to promote the structural properties of ground truth functional maps on new shape collections.
arXiv Detail & Related papers (2025-07-31T16:44:54Z)
Random Sparse Lifts: Construction, Analysis and Convergence of finite sparse networks [17.487761710665968]
We present a framework to define a large class of neural networks for which, by construction, training by gradient flow provably reaches arbitrarily low loss when the number of parameters grows.
arXiv Detail & Related papers (2025-01-10T12:52:00Z)
Low-Rank Constraints for Fast Inference in Structured Models [110.38427965904266]
This work demonstrates a simple approach to reduce the computational and memory complexity of a large class of structured models. Experiments with neural parameterized structured models for language modeling, polyphonic music modeling, unsupervised grammar induction, and video modeling show that our approach matches the accuracy of standard models at large state spaces.
arXiv Detail & Related papers (2022-01-08T00:47:50Z)
Closed-form Continuous-Depth Models [99.40335716948101]
Continuous-depth neural models rely on advanced numerical differential equation solvers. We present a new family of models, termed Closed-form Continuous-depth (CfC) networks, that are simple to describe and at least one order of magnitude faster.
arXiv Detail & Related papers (2021-06-25T22:08:51Z)
Fast Hierarchical Games for Image Explanations [78.16853337149871]
We present a model-agnostic explanation method for image classification based on a hierarchical extension of Shapley coefficients. Unlike other Shapley-based explanation methods, h-Shap is scalable and can be computed without the need of approximation. We compare our hierarchical approach with popular Shapley-based and non-Shapley-based methods on a synthetic dataset, a medical imaging scenario, and a general computer vision problem.
arXiv Detail & Related papers (2021-04-13T13:11:02Z)
Goal-directed Generation of Discrete Structures with Conditional Generative Models [85.51463588099556]
We introduce a novel approach to directly optimize a reinforcement learning objective, maximizing an expected reward. We test our methodology on two tasks: generating molecules with user-defined properties and identifying short python expressions which evaluate to a given target value.
arXiv Detail & Related papers (2020-10-05T20:03:13Z)
Seq2Tens: An Efficient Representation of Sequences by Low-Rank Tensor Projections [11.580603875423408]
Sequential data such as time series, video, or text can be challenging to analyse. At the heart of this is non-commutativity, in the sense that reordering the elements of a sequence can completely change its meaning. We use a classical mathematical object -- the tensor algebra -- to capture such dependencies.
arXiv Detail & Related papers (2020-06-12T09:24:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.