Full-Span Log-Linear Model and Fast Learning Algorithm
- URL: http://arxiv.org/abs/2202.08472v1
- Date: Thu, 17 Feb 2022 06:51:59 GMT
- Title: Full-Span Log-Linear Model and Fast Learning Algorithm
- Authors: Kazuya Takabatake, Shotaro Akaho
- Abstract summary: The full-span log-linear(FSLL) model is considered an $n$-th order Boltzmann machine.
The FSLL model has $|X|-1$ parameters and can represent arbitrary positive distributions of $X$.
Experiments presented that the FSLL successfully learned six training datasets such that $|X|=220$ within one minute with a laptop PC.
- Score: 1.9798034349981162
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: The full-span log-linear(FSLL) model introduced in this paper is considered
an $n$-th order Boltzmann machine, where $n$ is the number of all variables in
the target system. Let $X=(X_0,...,X_{n-1})$ be finite discrete random
variables that can take $|X|=|X_0|...|X_{n-1}|$ different values. The FSLL
model has $|X|-1$ parameters and can represent arbitrary positive distributions
of $X$. The FSLL model is a "highest-order" Boltzmann machine; nevertheless, we
can compute the dual parameters of the model distribution, which plays
important roles in exponential families, in $O(|X|\log|X|)$ time. Furthermore,
using properties of the dual parameters of the FSLL model, we can construct an
efficient learning algorithm. The FSLL model is limited to small probabilistic
models up to $|X|\approx2^{25}$; however, in this problem domain, the FSLL
model flexibly fits various true distributions underlying the training data
without any hyperparameter tuning. The experiments presented that the FSLL
successfully learned six training datasets such that $|X|=2^{20}$ within one
minute with a laptop PC.
Related papers
- IT$^3$: Idempotent Test-Time Training [95.78053599609044]
This paper introduces Idempotent Test-Time Training (IT$3$), a novel approach to addressing the challenge of distribution shift.
IT$3$ is based on the universal property of idempotence.
We demonstrate the versatility of our approach across various tasks, including corrupted image classification.
arXiv Detail & Related papers (2024-10-05T15:39:51Z) - Transformer In-Context Learning for Categorical Data [51.23121284812406]
We extend research on understanding Transformers through the lens of in-context learning with functional data by considering categorical outcomes, nonlinear underlying models, and nonlinear attention.
We present what is believed to be the first real-world demonstration of this few-shot-learning methodology, using the ImageNet dataset.
arXiv Detail & Related papers (2024-05-27T15:03:21Z) - Agnostic Active Learning of Single Index Models with Linear Sample Complexity [27.065175036001246]
We study active learning methods for single index models of the form $F(mathbf x) = f(langle mathbf w, mathbf xrangle)$.
arXiv Detail & Related papers (2024-05-15T13:11:28Z) - Agnostically Learning Multi-index Models with Queries [54.290489524576756]
We study the power of query access for the task of agnostic learning under the Gaussian distribution.
We show that query access gives significant runtime improvements over random examples for agnostically learning MIMs.
arXiv Detail & Related papers (2023-12-27T15:50:47Z) - Efficient Sampling of Stochastic Differential Equations with Positive
Semi-Definite Models [91.22420505636006]
This paper deals with the problem of efficient sampling from a differential equation, given the drift function and the diffusion matrix.
It is possible to obtain independent and identically distributed (i.i.d.) samples at precision $varepsilon$ with a cost that is $m2 d log (1/varepsilon)$
Our results suggest that as the true solution gets smoother, we can circumvent the curse of dimensionality without requiring any sort of convexity.
arXiv Detail & Related papers (2023-03-30T02:50:49Z) - Stochastic Approximation Approaches to Group Distributionally Robust
Optimization [96.26317627118912]
Group distributionally robust optimization (GDRO)
Online learning techniques to reduce the number of samples required in each round from $m$ to $1$, keeping the same sample.
A novel formulation of weighted GDRO, which allows us to derive distribution-dependent convergence rates.
arXiv Detail & Related papers (2023-02-18T09:24:15Z) - PFGM++: Unlocking the Potential of Physics-Inspired Generative Models [14.708385906024546]
We introduce a new family of physics-inspired generative models termed PFGM++.
These models realize generative trajectories for $N$ dimensional data by embedding paths in $N+D$ dimensional space.
We show that models with finite $D$ can be superior to previous state-of-the-art diffusion models.
arXiv Detail & Related papers (2023-02-08T18:58:02Z) - Tight Bounds on the Hardness of Learning Simple Nonparametric Mixtures [9.053430799456587]
We study the problem of learning nonparametric distributions in a finite mixture.
We establish tight bounds on the sample complexity for learning the component distributions in such models.
arXiv Detail & Related papers (2022-03-28T23:53:48Z) - TURF: A Two-factor, Universal, Robust, Fast Distribution Learning
Algorithm [64.13217062232874]
One of its most powerful and successful modalities approximates every distribution to an $ell$ distance essentially at most a constant times larger than its closest $t$-piece degree-$d_$.
We provide a method that estimates this number near-optimally, hence helps approach the best possible approximation.
arXiv Detail & Related papers (2022-02-15T03:49:28Z) - Sample-Efficient Reinforcement Learning for Linearly-Parameterized MDPs
with a Generative Model [3.749193647980305]
This paper considers a Markov decision process (MDP) that admits a set of state-action features.
We show that a model-based approach (resp.$$Q-learning) provably learns an $varepsilon$-optimal policy with high probability.
arXiv Detail & Related papers (2021-05-28T17:49:39Z) - An Algorithm for Learning Smaller Representations of Models With Scarce
Data [0.0]
We present a greedy algorithm for solving binary classification problems in situations where the dataset is too small or not fully representative.
It relies on a trained model with loose accuracy constraints, an iterative hyperparameter pruning procedure, and a function used to generate new data.
arXiv Detail & Related papers (2020-10-15T19:17:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.