Scalable Kernel Logistic Regression with Nystr\"om Approximation:
Theoretical Analysis and Application to Discrete Choice Modelling
- URL: http://arxiv.org/abs/2402.06763v1
- Date: Fri, 9 Feb 2024 19:52:31 GMT
- Title: Scalable Kernel Logistic Regression with Nystr\"om Approximation:
Theoretical Analysis and Application to Discrete Choice Modelling
- Authors: Jos\'e \'Angel Mart\'in-Baos, Ricardo Garc\'ia-R\'odenas, Luis
Rodriguez-Benitez, Michel Bierlaire
- Abstract summary: This paper introduces the Nystr"om approximation for Kernel Logistic Regression (KLR) on large datasets.
Four landmark selection methods are tested, including basic uniform sampling, a k-means sampling strategy, and two non-uniform methods grounded in leverage scores.
The performance of these strategies is evaluated using large-scale transport mode choice datasets.
- Score: 1.208453901299241
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The application of kernel-based Machine Learning (ML) techniques to discrete
choice modelling using large datasets often faces challenges due to memory
requirements and the considerable number of parameters involved in these
models. This complexity hampers the efficient training of large-scale models.
This paper addresses these problems of scalability by introducing the Nystr\"om
approximation for Kernel Logistic Regression (KLR) on large datasets. The study
begins by presenting a theoretical analysis in which: i) the set of KLR
solutions is characterised, ii) an upper bound to the solution of KLR with
Nystr\"om approximation is provided, and finally iii) a specialisation of the
optimisation algorithms to Nystr\"om KLR is described. After this, the
Nystr\"om KLR is computationally validated. Four landmark selection methods are
tested, including basic uniform sampling, a k-means sampling strategy, and two
non-uniform methods grounded in leverage scores. The performance of these
strategies is evaluated using large-scale transport mode choice datasets and is
compared with traditional methods such as Multinomial Logit (MNL) and
contemporary ML techniques. The study also assesses the efficiency of various
optimisation techniques for the proposed Nystr\"om KLR model. The performance
of gradient descent, Momentum, Adam, and L-BFGS-B optimisation methods is
examined on these datasets. Among these strategies, the k-means Nystr\"om KLR
approach emerges as a successful solution for applying KLR to large datasets,
particularly when combined with the L-BFGS-B and Adam optimisation methods. The
results highlight the ability of this strategy to handle datasets exceeding
200,000 observations while maintaining robust performance.
Related papers
- NAN: A Training-Free Solution to Coefficient Estimation in Model Merging [61.36020737229637]
We show that the optimal merging weights should scale with the amount of task-specific information encoded in each model.<n>We propose NAN, a simple yet effective method that estimates model merging coefficients via the inverse of parameter norm.<n>NAN is training-free, plug-and-play, and applicable to a wide range of merging strategies.
arXiv Detail & Related papers (2025-05-22T02:46:08Z) - Constructing Gaussian Processes via Samplets [0.0]
We examine recent convergence results to identify models with optimal convergence rates.
We propose a Samplet-based approach to efficiently construct and train the Gaussian Processes.
arXiv Detail & Related papers (2024-11-11T18:01:03Z) - On Sampling Strategies for Spectral Model Sharding [7.185534285278903]
In this work, we present two sampling strategies for such sharding.
The first produces unbiased estimators of the original weights, while the second aims to minimize the squared approximation error.
We demonstrate that both of these methods can lead to improved performance on various commonly used datasets.
arXiv Detail & Related papers (2024-10-31T16:37:25Z) - Latent Semantic Consensus For Deterministic Geometric Model Fitting [109.44565542031384]
We propose an effective method called Latent Semantic Consensus (LSC)
LSC formulates the model fitting problem into two latent semantic spaces based on data points and model hypotheses.
LSC is able to provide consistent and reliable solutions within only a few milliseconds for general multi-structural model fitting.
arXiv Detail & Related papers (2024-03-11T05:35:38Z) - Minimally Supervised Learning using Topological Projections in
Self-Organizing Maps [55.31182147885694]
We introduce a semi-supervised learning approach based on topological projections in self-organizing maps (SOMs)
Our proposed method first trains SOMs on unlabeled data and then a minimal number of available labeled data points are assigned to key best matching units (BMU)
Our results indicate that the proposed minimally supervised model significantly outperforms traditional regression techniques.
arXiv Detail & Related papers (2024-01-12T22:51:48Z) - Maximize to Explore: One Objective Function Fusing Estimation, Planning,
and Exploration [87.53543137162488]
We propose an easy-to-implement online reinforcement learning (online RL) framework called textttMEX.
textttMEX integrates estimation and planning components while balancing exploration exploitation automatically.
It can outperform baselines by a stable margin in various MuJoCo environments with sparse rewards.
arXiv Detail & Related papers (2023-05-29T17:25:26Z) - Learning to Select Pivotal Samples for Meta Re-weighting [12.73177872962048]
We study how to learn to identify such a meta sample set from a large, imperfect training set, that is subsequently cleaned and used to optimize performance.
We propose two clustering methods within our learning framework, Representation-based clustering method (RBC) and Gradient-based clustering method (GBC)
arXiv Detail & Related papers (2023-02-09T03:04:40Z) - A distribution-free mixed-integer optimization approach to hierarchical modelling of clustered and longitudinal data [0.0]
We introduce an innovative algorithm that evaluates cluster effects for new data points, thereby increasing the robustness and precision of this model.
The inferential and predictive efficacy of this approach is further illustrated through its application in student scoring and protein expression.
arXiv Detail & Related papers (2023-02-06T23:34:51Z) - Sparse high-dimensional linear regression with a partitioned empirical
Bayes ECM algorithm [62.997667081978825]
We propose a computationally efficient and powerful Bayesian approach for sparse high-dimensional linear regression.
Minimal prior assumptions on the parameters are used through the use of plug-in empirical Bayes estimates.
The proposed approach is implemented in the R package probe.
arXiv Detail & Related papers (2022-09-16T19:15:50Z) - Exploiting Temporal Structures of Cyclostationary Signals for
Data-Driven Single-Channel Source Separation [98.95383921866096]
We study the problem of single-channel source separation (SCSS)
We focus on cyclostationary signals, which are particularly suitable in a variety of application domains.
We propose a deep learning approach using a U-Net architecture, which is competitive with the minimum MSE estimator.
arXiv Detail & Related papers (2022-08-22T14:04:56Z) - A Manifold Proximal Linear Method for Sparse Spectral Clustering with
Application to Single-Cell RNA Sequencing Data Analysis [9.643152256249884]
This paper considers the widely adopted SSC model as an optimization model with nonsmooth and non objective.
We propose a new method (ManPL) that solves the original SSC problem.
Results of the proposed methods are established.
arXiv Detail & Related papers (2020-07-18T22:05:00Z) - Clustering Binary Data by Application of Combinatorial Optimization
Heuristics [52.77024349608834]
We study clustering methods for binary data, first defining aggregation criteria that measure the compactness of clusters.
Five new and original methods are introduced, using neighborhoods and population behavior optimization metaheuristics.
From a set of 16 data tables generated by a quasi-Monte Carlo experiment, a comparison is performed for one of the aggregations using L1 dissimilarity, with hierarchical clustering, and a version of k-means: partitioning around medoids or PAM.
arXiv Detail & Related papers (2020-01-06T23:33:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.