Related papers: MARM: Unlocking the Future of Recommendation Systems through Memory Augmentation and Scalable Complexity

MARM: Unlocking the Future of Recommendation Systems through Memory Augmentation and Scalable Complexity

URL: http://arxiv.org/abs/2411.09425v2
Date: Mon, 10 Feb 2025 15:17:49 GMT
Title: MARM: Unlocking the Future of Recommendation Systems through Memory Augmentation and Scalable Complexity
Authors: Xiao Lv, Jiangxia Cao, Shijie Guan, Xiaoyou Zhou, Zhiguang Qi, Yaqiang Zang, Ming Li, Ben Wang, Kun Gai, Guorui Zhou,
Abstract summary: We propose MARM (Memory Augmented Recommendation Model), which explores a new cache scaling-laws successfully.<n>For a RecSys model, compared to model parameters, the computational complexity FLOPs is a more expensive factor that requires careful control.
Score: 18.865266475439135
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Scaling-law has guided the language model designing for past years, however, it is worth noting that the scaling laws of NLP cannot be directly applied to RecSys due to the following reasons: (1) The amount of training samples and model parameters is typically not the bottleneck for the model. Our recommendation system can generate over 50 billion user samples daily, and such a massive amount of training data can easily allow our model parameters to exceed 200 billion, surpassing many LLMs (about 100B). (2) To ensure the stability and robustness of the recommendation system, it is essential to control computational complexity FLOPs carefully. Considering the above differences with LLM, we can draw a conclusion that: for a RecSys model, compared to model parameters, the computational complexity FLOPs is a more expensive factor that requires careful control. In this paper, we propose our milestone work, MARM (Memory Augmented Recommendation Model), which explores a new cache scaling-laws successfully.

Related papers

Predictable Scale: Part II, Farseer: A Refined Scaling Law in Large Language Models [62.3458061002951]
We introduce Farseer, a novel and refined scaling law offering enhanced predictive accuracy across scales.<n>By systematically constructing a model loss surface $L(N,D)$, Farseer achieves a significantly better fit to empirical data than prior laws.<n>Our methodology yields accurate, robust, and highly generalizable predictions, demonstrating excellent extrapolation capabilities.
arXiv Detail & Related papers (2025-06-12T17:59:23Z)
Leveraging Coordinate Momentum in SignSGD and Muon: Memory-Optimized Zero-Order [38.99428012275441]
Fine-tuning Large Language Models (LLMs) is essential for adapting pre-trained models to downstream tasks.<n>Traditional first-order algorithms incur prohibitive memory and computational costs that scale poorly with model size.<n>We propose zero-order (ZO) optimization methods as a memory- and compute-efficient alternative.
arXiv Detail & Related papers (2025-06-04T20:27:17Z)
Scalable Complexity Control Facilitates Reasoning Ability of LLMs [41.607173110806265]
We show that model complexity control can improve the scaling law of large language models consistently over varying model sizes and data sizes.<n>Results indicate that complexity control is a promising direction for the continual advancement of LLMs.
arXiv Detail & Related papers (2025-05-29T02:42:20Z)
LatentLLM: Attention-Aware Joint Tensor Compression [50.33925662486034]
Large language models (LLMs) and large multi-modal models (LMMs) require a massive amount of computational and memory resources.<n>We propose a new framework to convert such LLMs/LMMs into a reduced-dimension latent structure.
arXiv Detail & Related papers (2025-05-23T22:39:54Z)
FineGates: LLMs Finetuning with Compression using Stochastic Gates [7.093692674858257]
Large Language Models (LLMs) present significant challenges for full finetuning due to the high computational demands. Lightweight finetuning techniques have been proposed, like learning low-rank adapter layers. We propose an adaptor model based on gates that simultaneously sparsify the frozen base model with task-specific adaptation.
arXiv Detail & Related papers (2024-12-17T14:33:05Z)
Optimizing Sequential Recommendation Models with Scaling Laws and Approximate Entropy [104.48511402784763]
Performance Law for SR models aims to theoretically investigate and model the relationship between model performance and data quality. We propose Approximate Entropy (ApEn) to assess data quality, presenting a more nuanced approach compared to traditional data quantity metrics.
arXiv Detail & Related papers (2024-11-30T10:56:30Z)
Reference Trustable Decoding: A Training-Free Augmentation Paradigm for Large Language Models [79.41139393080736]
Large language models (LLMs) have rapidly advanced and demonstrated impressive capabilities. In-Context Learning (ICL) and. Efficient Fine-Tuning (PEFT) are currently two mainstream methods for augmenting. LLMs to downstream tasks. We propose Reference Trustable Decoding (RTD), a paradigm that allows models to quickly adapt to new tasks without fine-tuning.
arXiv Detail & Related papers (2024-09-30T10:48:20Z)
MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies [85.57899012821211]
Small Language Models (SLMs) are a resource-efficient alternative to Large Language Models (LLMs) We introduce MiniCPM, specifically the 1.2B and 2.4B non-embedding parameter variants. We also introduce MiniCPM family, including MiniCPM-DPO, MiniCPM-MoE and MiniCPM-128K.
arXiv Detail & Related papers (2024-04-09T15:36:50Z)
MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT [87.4910758026772]
"Bigger the better" has been the predominant trend in recent Large Language Models (LLMs) development. This paper explores the "less is more" paradigm by addressing the challenge of designing accurate yet efficient Small Language Models (SLMs) for resource constrained devices.
arXiv Detail & Related papers (2024-02-26T18:59:03Z)
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases [46.997172696192195]
This paper addresses the need for efficient large language models (LLMs) on mobile devices, driven by increasing cloud costs and latency concerns. We focus on designing top-quality LLMs with fewer than a billion parameters, a practical choice for mobile deployment.
arXiv Detail & Related papers (2024-02-22T18:58:55Z)
Induced Model Matching: How Restricted Models Can Help Larger Ones [1.7676816383911753]
We consider scenarios where a very accurate predictive model using restricted features is available at the time of training of a larger, full-featured, model. How can the restricted model be useful to the full model? We propose an approach for transferring the knowledge of the restricted model to the full model, by aligning the full model's context-restricted performance with that of the restricted model's.
arXiv Detail & Related papers (2024-02-19T20:21:09Z)
Scaling Relationship on Learning Mathematical Reasoning with Large Language Models [75.29595679428105]
We investigate how the pre-training loss, supervised data amount, and augmented data amount influence the reasoning performances of a supervised LLM. We find that rejection samples from multiple models push LLaMA-7B to an accuracy of 49.3% on GSM8K which outperforms the supervised fine-tuning (SFT) accuracy of 35.9% significantly.
arXiv Detail & Related papers (2023-08-03T15:34:01Z)
Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes [91.58845026796149]
We introduce Distilling step-by-step, a new mechanism that trains small models that outperform large language models. We present three findings across 4 NLP benchmarks.
arXiv Detail & Related papers (2023-05-03T17:50:56Z)
nanoLM: an Affordable LLM Pre-training Benchmark via Accurate Loss Prediction across Scales [65.01417261415833]
We present an approach to predict the pre-training loss based on our observations that Maximal Update Parametrization (muP) enables accurate fitting of scaling laws. With around 14% of the one-time pre-training cost, we can accurately forecast the loss for models up to 52B. Our goal with nanoLM is to empower researchers with limited resources to reach meaningful conclusions on large models.
arXiv Detail & Related papers (2023-04-14T00:45:01Z)
Revisiting minimum description length complexity in overparameterized models [38.21167656112762]
We provide an extensive theoretical characterization of MDL-COMP for linear models and kernel methods. For kernel methods, we show that MDL-COMP informs minimax in-sample error, and can decrease as the dimensionality of the input increases. We also prove that MDL-COMP bounds the in-sample mean squared error (MSE)
arXiv Detail & Related papers (2020-06-17T22:45:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.