The Power of Log-Sum-Exp: Sequential Density Ratio Matrix Estimation for
Speed-Accuracy Optimization
- URL: http://arxiv.org/abs/2105.13636v2
- Date: Mon, 31 May 2021 01:44:50 GMT
- Title: The Power of Log-Sum-Exp: Sequential Density Ratio Matrix Estimation for
Speed-Accuracy Optimization
- Authors: Taiki Miyagawa and Akinori F. Ebihara
- Abstract summary: We propose a model for multiclass classification of time series to make a prediction as early and as accurate as possible.
Our overall architecture for early classification, MSPRT-TANDEM, statistically significantly outperforms baseline models on four datasets.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose a model for multiclass classification of time series to make a
prediction as early and as accurate as possible. The matrix sequential
probability ratio test (MSPRT) is known to be asymptotically optimal for this
setting, but contains a critical assumption that hinders broad real-world
applications; the MSPRT requires the underlying probability density. To address
this problem, we propose to solve density ratio matrix estimation (DRME), a
novel type of density ratio estimation that consists of estimating matrices of
multiple density ratios with constraints and thus is more challenging than the
conventional density ratio estimation. We propose a log-sum-exp-type loss
function (LSEL) for solving DRME and prove the following: (i) the LSEL provides
the true density ratio matrix as the sample size of the training set increases
(consistency); (ii) it assigns larger gradients to harder classes (hard class
weighting effect); and (iii) it provides discriminative scores even on
class-imbalanced datasets (guess-aversion). Our overall architecture for early
classification, MSPRT-TANDEM, statistically significantly outperforms baseline
models on four datasets including action recognition, especially in the early
stage of sequential observations. Our code and datasets are publicly available
at: https://github.com/TaikiMiyagawa/MSPRT-TANDEM.
Related papers
- Selecting the Number of Communities for Weighted Degree-Corrected Stochastic Block Models [5.117940794592611]
We investigate how to select the number of communities for weighted networks without a full likelihood modeling.
We propose a novel weighted degree-corrected block model (DCSBM), in which the mean adjacency matrix is modeled as the same as in standard DCSBM.
Our method of selecting the number of communities is based on a sequential testing framework, and in each step the weighted DCSBM is fitted via some spectral clustering method.
arXiv Detail & Related papers (2024-06-08T03:47:38Z) - Maximum Likelihood Estimation is All You Need for Well-Specified
Covariate Shift [34.414261291690856]
Key challenge of modern machine learning systems is to achieve Out-of-Distribution (OOD) generalization.
We show that classical Maximum Likelihood Estimation (MLE) purely using source data achieves the minimax optimality.
We illustrate the wide applicability of our framework by instantiating it to three concrete examples.
arXiv Detail & Related papers (2023-11-27T16:06:48Z) - RGM: A Robust Generalizable Matching Model [49.60975442871967]
We propose a deep model for sparse and dense matching, termed RGM (Robust Generalist Matching)
To narrow the gap between synthetic training samples and real-world scenarios, we build a new, large-scale dataset with sparse correspondence ground truth.
We are able to mix up various dense and sparse matching datasets, significantly improving the training diversity.
arXiv Detail & Related papers (2023-10-18T07:30:08Z) - Probabilistic Unrolling: Scalable, Inverse-Free Maximum Likelihood
Estimation for Latent Gaussian Models [69.22568644711113]
We introduce probabilistic unrolling, a method that combines Monte Carlo sampling with iterative linear solvers to circumvent matrix inversions.
Our theoretical analyses reveal that unrolling and backpropagation through the iterations of the solver can accelerate gradient estimation for maximum likelihood estimation.
In experiments on simulated and real data, we demonstrate that probabilistic unrolling learns latent Gaussian models up to an order of magnitude faster than gradient EM, with minimal losses in model performance.
arXiv Detail & Related papers (2023-06-05T21:08:34Z) - Compound Batch Normalization for Long-tailed Image Classification [77.42829178064807]
We propose a compound batch normalization method based on a Gaussian mixture.
It can model the feature space more comprehensively and reduce the dominance of head classes.
The proposed method outperforms existing methods on long-tailed image classification.
arXiv Detail & Related papers (2022-12-02T07:31:39Z) - Smooth densities and generative modeling with unsupervised random
forests [1.433758865948252]
An important application for density estimators is synthetic data generation.
We propose a new method based on unsupervised random forests for estimating smooth densities in arbitrary dimensions without parametric constraints.
We prove the consistency of our approach and demonstrate its advantages over existing tree-based density estimators.
arXiv Detail & Related papers (2022-05-19T09:50:25Z) - Density Ratio Estimation via Infinitesimal Classification [85.08255198145304]
We propose DRE-infty, a divide-and-conquer approach to reduce Density ratio estimation (DRE) to a series of easier subproblems.
Inspired by Monte Carlo methods, we smoothly interpolate between the two distributions via an infinite continuum of intermediate bridge distributions.
We show that our approach performs well on downstream tasks such as mutual information estimation and energy-based modeling on complex, high-dimensional datasets.
arXiv Detail & Related papers (2021-11-22T06:26:29Z) - Learning to Estimate Without Bias [57.82628598276623]
Gauss theorem states that the weighted least squares estimator is a linear minimum variance unbiased estimation (MVUE) in linear models.
In this paper, we take a first step towards extending this result to non linear settings via deep learning with bias constraints.
A second motivation to BCE is in applications where multiple estimates of the same unknown are averaged for improved performance.
arXiv Detail & Related papers (2021-10-24T10:23:51Z) - Amortized Conditional Normalized Maximum Likelihood: Reliable Out of
Distribution Uncertainty Estimation [99.92568326314667]
We propose the amortized conditional normalized maximum likelihood (ACNML) method as a scalable general-purpose approach for uncertainty estimation.
Our algorithm builds on the conditional normalized maximum likelihood (CNML) coding scheme, which has minimax optimal properties according to the minimum description length principle.
We demonstrate that ACNML compares favorably to a number of prior techniques for uncertainty estimation in terms of calibration on out-of-distribution inputs.
arXiv Detail & Related papers (2020-11-05T08:04:34Z) - Exact and Approximation Algorithms for Sparse PCA [1.7640556247739623]
This paper proposes two exact mixed-integer SDPs (MISDPs)
We then analyze the theoretical optimality gaps of their continuous relaxation values and prove that they are stronger than that of the state-of-art one.
Since off-the-shelf solvers, in general, have difficulty in solving MISDPs, we approximate SPCA with arbitrary accuracy by a mixed-integer linear program (MILP) of a similar size as MISDPs.
arXiv Detail & Related papers (2020-08-28T02:07:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.