Memory-Efficient Factorization Machines via Binarizing both Data and
Model Coefficients
- URL: http://arxiv.org/abs/2108.07421v1
- Date: Tue, 17 Aug 2021 03:30:52 GMT
- Title: Memory-Efficient Factorization Machines via Binarizing both Data and
Model Coefficients
- Authors: Yu Geng and Liang Lan
- Abstract summary: Subspace imating machine (SEFM) has been proposed to overcome the limitation of Factorization Machines (FM)
We propose a new method called Binarized FM which constraints the model parameters to be binary values.
Our proposed method achieves comparable accuracy with SEFM but with much less memory cost.
- Score: 9.692334398809457
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Factorization Machines (FM), a general predictor that can efficiently model
feature interactions in linear time, was primarily proposed for collaborative
recommendation and have been broadly used for regression, classification and
ranking tasks. Subspace Encoding Factorization Machine (SEFM) has been proposed
recently to overcome the expressiveness limitation of Factorization Machines
(FM) by applying explicit nonlinear feature mapping for both individual
features and feature interactions through one-hot encoding to each input
feature. Despite the effectiveness of SEFM, it increases the memory cost of FM
by $b$ times, where $b$ is the number of bins when applying one-hot encoding on
each input feature. To reduce the memory cost of SEFM, we propose a new method
called Binarized FM which constraints the model parameters to be binary values
(i.e., 1 or $-1$). Then each parameter value can be efficiently stored in one
bit. Our proposed method can significantly reduce the memory cost of SEFM
model. In addition, we propose a new algorithm to effectively and efficiently
learn proposed FM with binary constraints using Straight Through Estimator
(STE) with Adaptive Gradient Descent (Adagrad). Finally, we evaluate the
performance of our proposed method on eight different classification datasets.
Our experimental results have demonstrated that our proposed method achieves
comparable accuracy with SEFM but with much less memory cost.
Related papers
- Joint Transmit and Pinching Beamforming for PASS: Optimization-Based or Learning-Based? [89.05848771674773]
A novel antenna system ()-enabled downlink multi-user multiple-input single-output (MISO) framework is proposed.
It consists of multiple waveguides, which equip numerous low-cost antennas, named (PAs)
The positions of PAs can be reconfigured to both spanning large-scale path and space.
arXiv Detail & Related papers (2025-02-12T18:54:10Z) - SMMF: Square-Matricized Momentum Factorization for Memory-Efficient Optimization [0.5755004576310332]
SMMF is a memory-efficient that reduces the memory requirement of the widely used adaptive learning rate Matrix, such as Adam, by up to 96%.
We conduct a regret bound analysis of SMMF, which shows that it converges similarly to non-memory-efficient adaptive learning rate Matrix, such as AdamNC.
In our experiment, SMMF takes up to 96% less memory compared to state-of-the-art memory efficients, e.g., Adafactor, CAME, and SM3, while achieving comparable model performance.
arXiv Detail & Related papers (2024-12-12T03:14:50Z) - ALoRE: Efficient Visual Adaptation via Aggregating Low Rank Experts [71.91042186338163]
ALoRE is a novel PETL method that reuses the hypercomplex parameterized space constructed by Kronecker product to Aggregate Low Rank Experts.
Thanks to the artful design, ALoRE maintains negligible extra parameters and can be effortlessly merged into the frozen backbone.
arXiv Detail & Related papers (2024-12-11T12:31:30Z) - Transforming Image Super-Resolution: A ConvFormer-based Efficient Approach [58.57026686186709]
We introduce the Convolutional Transformer layer (ConvFormer) and propose a ConvFormer-based Super-Resolution network (CFSR)
CFSR inherits the advantages of both convolution-based and transformer-based approaches.
Experiments demonstrate that CFSR strikes an optimal balance between computational cost and performance.
arXiv Detail & Related papers (2024-01-11T03:08:00Z) - Fine-Tuning Language Models with Just Forward Passes [92.04219196752007]
Fine-tuning language models (LMs) has yielded success on diverse downstream tasks, but as LMs grow in size, backpropagation requires a large amount of memory.
We propose a memory-efficient zerothorder (MeZO) to operate in-place, thereby fine-tuning LMs with the same memory footprint as inference.
arXiv Detail & Related papers (2023-05-27T02:28:10Z) - Nearly Minimax Optimal Reinforcement Learning for Linear Markov Decision
Processes [80.89852729380425]
We propose the first computationally efficient algorithm that achieves the nearly minimax optimal regret $tilde O(dsqrtH3K)$.
Our work provides a complete answer to optimal RL with linear MDPs, and the developed algorithm and theoretical tools may be of independent interest.
arXiv Detail & Related papers (2022-12-12T18:58:59Z) - On Computing the Hyperparameter of Extreme Learning Machines: Algorithm
and Application to Computational PDEs, and Comparison with Classical and
High-Order Finite Elements [0.0]
We consider the use of extreme learning machines (ELM) for computational partial differential equations (PDE)
In ELM the hidden-layer coefficients in the neural network are assigned to random values generated on $[-R_m,R_m]$ and fixed.
We present a method for computing the optimal value of $R_m$ based on the differential evolution algorithm.
arXiv Detail & Related papers (2021-10-27T02:05:26Z) - Joint Majorization-Minimization for Nonnegative Matrix Factorization
with the $\beta$-divergence [4.468952886990851]
This article proposes new multiplicative updates for nonnegative matrix factorization (NMF) with the $beta$-divergence objective function.
We report experimental results using diverse datasets: face images, an audio spectrogram, hyperspectral data and song play counts.
arXiv Detail & Related papers (2021-06-29T09:58:21Z) - Factorization Machines with Regularization for Sparse Feature
Interactions [13.593781209611112]
Factorization machines (FMs) are machine learning predictive models based on second-order feature interactions.
We present a new regularization scheme for feature interaction selection in FMs.
For feature interaction selection, our proposed regularizer makes the feature interaction matrix sparse without a restriction on sparsity patterns imposed by the existing methods.
arXiv Detail & Related papers (2020-10-19T05:00:40Z) - Efficient Learning of Generative Models via Finite-Difference Score
Matching [111.55998083406134]
We present a generic strategy to efficiently approximate any-order directional derivative with finite difference.
Our approximation only involves function evaluations, which can be executed in parallel, and no gradient computations.
arXiv Detail & Related papers (2020-07-07T10:05:01Z) - DS-FACTO: Doubly Separable Factorization Machines [4.281959480566438]
Factorization Machines (FM) are powerful class of models that incorporate higher-order interaction among features to add more expressive power to linear models.
Despite using a low-rank representation for the pairwise features, the memory overheads of using factorization machines on large-scale real-world datasets can be prohibitively high.
Traditional algorithms for FM which work on a single-machine are not equipped to handle this scale and therefore, using a distributed algorithm to parallelize computation across a cluster is inevitable.
arXiv Detail & Related papers (2020-04-29T03:36:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.