Related papers: Memory-Efficient Factorization Machines via Binarizing both Data and Model Coefficients

Memory-Efficient Factorization Machines via Binarizing both Data and Model Coefficients

URL: http://arxiv.org/abs/2108.07421v1
Date: Tue, 17 Aug 2021 03:30:52 GMT
Title: Memory-Efficient Factorization Machines via Binarizing both Data and Model Coefficients
Authors: Yu Geng and Liang Lan
Abstract summary: Subspace imating machine (SEFM) has been proposed to overcome the limitation of Factorization Machines (FM) We propose a new method called Binarized FM which constraints the model parameters to be binary values. Our proposed method achieves comparable accuracy with SEFM but with much less memory cost.
Score: 9.692334398809457
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Factorization Machines (FM), a general predictor that can efficiently model feature interactions in linear time, was primarily proposed for collaborative recommendation and have been broadly used for regression, classification and ranking tasks. Subspace Encoding Factorization Machine (SEFM) has been proposed recently to overcome the expressiveness limitation of Factorization Machines (FM) by applying explicit nonlinear feature mapping for both individual features and feature interactions through one-hot encoding to each input feature. Despite the effectiveness of SEFM, it increases the memory cost of FM by $b$ times, where $b$ is the number of bins when applying one-hot encoding on each input feature. To reduce the memory cost of SEFM, we propose a new method called Binarized FM which constraints the model parameters to be binary values (i.e., 1 or $-1$). Then each parameter value can be efficiently stored in one bit. Our proposed method can significantly reduce the memory cost of SEFM model. In addition, we propose a new algorithm to effectively and efficiently learn proposed FM with binary constraints using Straight Through Estimator (STE) with Adaptive Gradient Descent (Adagrad). Finally, we evaluate the performance of our proposed method on eight different classification datasets. Our experimental results have demonstrated that our proposed method achieves comparable accuracy with SEFM but with much less memory cost.

Related papers

Reinforced Model Merging [53.84354455400038]
We present an innovative framework termed Reinforced Model Merging (RMM), which encompasses an environment and agent tailored for merging tasks. By utilizing data subsets during the evaluation process, we addressed the bottleneck in the reward feedback phase, thereby accelerating RMM by up to 100 times.
arXiv Detail & Related papers (2025-03-27T08:52:41Z)
SMMF: Square-Matricized Momentum Factorization for Memory-Efficient Optimization [0.5755004576310332]
SMMF is a memory-efficient that reduces the memory requirement of the widely used adaptive learning rate Matrix, such as Adam, by up to 96%. We conduct a regret bound analysis of SMMF, which shows that it converges similarly to non-memory-efficient adaptive learning rate Matrix, such as AdamNC. In our experiment, SMMF takes up to 96% less memory compared to state-of-the-art memory efficients, e.g., Adafactor, CAME, and SM3, while achieving comparable model performance.
arXiv Detail & Related papers (2024-12-12T03:14:50Z)
ALoRE: Efficient Visual Adaptation via Aggregating Low Rank Experts [71.91042186338163]
ALoRE is a novel PETL method that reuses the hypercomplex parameterized space constructed by Kronecker product to Aggregate Low Rank Experts. Thanks to the artful design, ALoRE maintains negligible extra parameters and can be effortlessly merged into the frozen backbone.
arXiv Detail & Related papers (2024-12-11T12:31:30Z)
Transforming Image Super-Resolution: A ConvFormer-based Efficient Approach [58.57026686186709]
We introduce the Convolutional Transformer layer (ConvFormer) and propose a ConvFormer-based Super-Resolution network (CFSR) CFSR inherits the advantages of both convolution-based and transformer-based approaches. Experiments demonstrate that CFSR strikes an optimal balance between computational cost and performance.
arXiv Detail & Related papers (2024-01-11T03:08:00Z)
Tuning Pre-trained Model via Moment Probing [62.445281364055795]
We propose a novel Moment Probing (MP) method to explore the potential of LP. MP performs a linear classification head based on the mean of final features. Our MP significantly outperforms LP and is competitive with counterparts at less training cost.
arXiv Detail & Related papers (2023-07-21T04:15:02Z)
Fine-Tuning Language Models with Just Forward Passes [92.04219196752007]
Fine-tuning language models (LMs) has yielded success on diverse downstream tasks, but as LMs grow in size, backpropagation requires a large amount of memory. We propose a memory-efficient zerothorder (MeZO) to operate in-place, thereby fine-tuning LMs with the same memory footprint as inference.
arXiv Detail & Related papers (2023-05-27T02:28:10Z)
Nearly Minimax Optimal Reinforcement Learning for Linear Markov Decision Processes [80.89852729380425]
We propose the first computationally efficient algorithm that achieves the nearly minimax optimal regret $tilde O(dsqrtH3K)$. Our work provides a complete answer to optimal RL with linear MDPs, and the developed algorithm and theoretical tools may be of independent interest.
arXiv Detail & Related papers (2022-12-12T18:58:59Z)
Boosting Factorization Machines via Saliency-Guided Mixup [125.15872106335692]
We present MixFM, inspired by Mixup, to generate auxiliary training data to boost Factorization machines (FMs) We also put forward a novel Factorization Machine powered by Saliency-guided Mixup (denoted as SMFM)
arXiv Detail & Related papers (2022-06-17T09:49:00Z)
On Computing the Hyperparameter of Extreme Learning Machines: Algorithm and Application to Computational PDEs, and Comparison with Classical and High-Order Finite Elements [0.0]
We consider the use of extreme learning machines (ELM) for computational partial differential equations (PDE) In ELM the hidden-layer coefficients in the neural network are assigned to random values generated on $[-R_m,R_m]$ and fixed. We present a method for computing the optimal value of $R_m$ based on the differential evolution algorithm.
arXiv Detail & Related papers (2021-10-27T02:05:26Z)
Joint Majorization-Minimization for Nonnegative Matrix Factorization with the $\beta$-divergence [4.468952886990851]
This article proposes new multiplicative updates for nonnegative matrix factorization (NMF) with the $beta$-divergence objective function. We report experimental results using diverse datasets: face images, an audio spectrogram, hyperspectral data and song play counts.
arXiv Detail & Related papers (2021-06-29T09:58:21Z)
Factorization Machines with Regularization for Sparse Feature Interactions [13.593781209611112]
Factorization machines (FMs) are machine learning predictive models based on second-order feature interactions. We present a new regularization scheme for feature interaction selection in FMs. For feature interaction selection, our proposed regularizer makes the feature interaction matrix sparse without a restriction on sparsity patterns imposed by the existing methods.
arXiv Detail & Related papers (2020-10-19T05:00:40Z)
Memory and Computation-Efficient Kernel SVM via Binary Embedding and Ternary Model Coefficients [18.52747917850984]
Kernel approximation is widely used to scale up kernel SVM training and prediction. Memory and computation costs of kernel approximation models are still too high if we want to deploy them on memory-limited devices. We propose a novel memory and computation-efficient kernel SVM model by using both binary embedding and binary model coefficients.
arXiv Detail & Related papers (2020-10-06T09:41:54Z)
Efficient Learning of Generative Models via Finite-Difference Score Matching [111.55998083406134]
We present a generic strategy to efficiently approximate any-order directional derivative with finite difference. Our approximation only involves function evaluations, which can be executed in parallel, and no gradient computations.
arXiv Detail & Related papers (2020-07-07T10:05:01Z)
DS-FACTO: Doubly Separable Factorization Machines [4.281959480566438]
Factorization Machines (FM) are powerful class of models that incorporate higher-order interaction among features to add more expressive power to linear models. Despite using a low-rank representation for the pairwise features, the memory overheads of using factorization machines on large-scale real-world datasets can be prohibitively high. Traditional algorithms for FM which work on a single-machine are not equipped to handle this scale and therefore, using a distributed algorithm to parallelize computation across a cluster is inevitable.
arXiv Detail & Related papers (2020-04-29T03:36:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.