Training-free Adjustable Polynomial Graph Filtering for Ultra-fast Multimodal Recommendation
- URL: http://arxiv.org/abs/2503.04406v2
- Date: Tue, 16 Sep 2025 06:35:48 GMT
- Title: Training-free Adjustable Polynomial Graph Filtering for Ultra-fast Multimodal Recommendation
- Authors: Yu-Seung Roh, Joo-Young Kim, Jin-Duk Park, Won-Yong Shin,
- Abstract summary: MultiModal-Graph Filtering (MM-GF) is a training-free method for efficient and accurate multimodal recommendations.<n> MM-GF not only improves recommendation accuracy by up to 22.25% compared to the best competitor but also dramatically reduces computational costs by achieving the runtime of less than 10 seconds.
- Score: 14.344218807527234
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multimodal recommender systems improve the performance of canonical recommender systems with no item features by utilizing diverse content types such as text, images, and videos, while alleviating inherent sparsity of user-item interactions and accelerating user engagement. However, current neural network-based models often incur significant computational overhead due to the complex training process required to learn and integrate information from multiple modalities. To address this challenge,we propose MultiModal-Graph Filtering (MM-GF), a training-free method grounded in graph filtering (GF) for efficient and accurate multimodal recommendations. Specifically, MM-GF first constructs multiple similarity graphs for two distinct modalities as well as user-item interaction data. Then, MM-GF optimally fuses these multimodal signals using a polynomial graph filter that allows for precise control of the frequency response by adjusting frequency bounds. Furthermore, the filter coefficients are treated as hyperparameters, enabling flexible and data-driven adaptation. Extensive experiments on real-world benchmark datasets demonstrate that MM-GF not only improves recommendation accuracy by up to 22.25% compared to the best competitor but also dramatically reduces computational costs by achieving the runtime of less than 10 seconds.
Related papers
- PRISM: Performer RS-IMLE for Single-pass Multisensory Imitation Learning [51.24484551729328]
We introduce PRISM, a single-pass policy based on a batch-global rejection-sampling variant of IMLE.<n> PRISM couples a temporal multisensory encoder with a linear-attention generator using a Performer architecture.<n>We demonstrate the efficacy of PRISM on a diverse real-world hardware suite, including loco-manipulation using a Unitree Go2 with a 7-DoF arm D1 and tabletop manipulation with a UR5 manipulator.
arXiv Detail & Related papers (2026-02-02T17:57:37Z) - FAIM: Frequency-Aware Interactive Mamba for Time Series Classification [87.84511960413715]
Time series classification (TSC) is crucial in numerous real-world applications, such as environmental monitoring, medical diagnosis, and posture recognition.<n>We propose FAIM, a lightweight Frequency-Aware Interactive Mamba model.<n>We show that FAIM consistently outperforms existing state-of-the-art (SOTA) methods, achieving a superior trade-off between accuracy and efficiency.
arXiv Detail & Related papers (2025-11-26T08:36:33Z) - Utility-Diversity Aware Online Batch Selection for LLM Supervised Fine-tuning [49.04912820721943]
Supervised fine-tuning (SFT) is computationally expensive and sometimes suffers from overfitting or bias amplification.<n>This work studies the online batch selection family that dynamically scores and filters samples during the training process.<n>We develop textbfUDS (Utility-Diversity Sampling), a framework for efficient online batch selection in SFT.
arXiv Detail & Related papers (2025-10-19T15:32:01Z) - VFP: Variational Flow-Matching Policy for Multi-Modal Robot Manipulation [0.0]
Variational Flow-Matching Policy captures both task-level and trajectory-level multi-modality.<n>VFP achieves a $49%$ relative improvement in task success rate over standard flow-based baselines.
arXiv Detail & Related papers (2025-08-03T07:23:02Z) - FindRec: Stein-Guided Entropic Flow for Multi-Modal Sequential Recommendation [50.438552588818]
We propose textbfFindRec (textbfFlexible unified textbfinformation textbfdisentanglement for multi-modal sequential textbfRecommendation)<n>A Stein kernel-based Integrated Information Coordination Module (IICM) theoretically guarantees distribution consistency between multimodal features and ID streams.<n>A cross-modal expert routing mechanism that adaptively filters and combines multimodal features based on their contextual relevance.
arXiv Detail & Related papers (2025-07-07T04:09:45Z) - Learning-Based Multiuser Scheduling in MIMO-OFDM Systems with Hybrid Beamforming [1.4272256806865102]
We investigate the multiuser scheduling problem in multiple-input multiple-output (MIMO) systems using frequency division multiplexing (OFDM) and hybrid beamforming.<n>To conduct scheduling, we propose solutions, such as greedy and sorting algorithms, followed by a machine learning (ML) approach.
arXiv Detail & Related papers (2025-06-09T21:59:05Z) - Gated Multimodal Graph Learning for Personalized Recommendation [9.466822984141086]
Multimodal recommendation has emerged as a promising solution to alleviate the cold-start and sparsity problems in collaborative filtering.<n>We propose RLMultimodalRec, a lightweight and modular recommendation framework that combines graph-based user modeling with adaptive multimodal item encoding.
arXiv Detail & Related papers (2025-05-30T16:57:17Z) - M2Rec: Multi-scale Mamba for Efficient Sequential Recommendation [35.508076394809784]
model is a novel sequential recommendation framework that integrates multi-scale Mamba with Fourier analysis, Large Language Models, and adaptive gating.<n>Experiments demonstrate that model achieves state-of-the-art performance, improving Hit Rate@10 by 3.2% over existing Mamba-based models.
arXiv Detail & Related papers (2025-05-07T14:14:29Z) - MIFNet: Learning Modality-Invariant Features for Generalizable Multimodal Image Matching [54.740256498985026]
Keypoint detection and description methods often struggle with multimodal data.<n>We propose a modality-invariant feature learning network (MIFNet) to compute modality-invariant features for keypoint descriptions in multimodal image matching.
arXiv Detail & Related papers (2025-01-20T06:56:30Z) - Multimodality Helps Few-shot 3D Point Cloud Semantic Segmentation [61.91492500828508]
Few-shot 3D point cloud segmentation (FS-PCS) aims at generalizing models to segment novel categories with minimal support samples.<n>We introduce a multimodal FS-PCS setup, utilizing textual labels and the potentially available 2D image modality.<n>We propose a simple yet effective Test-time Adaptive Cross-modal (TACC) technique to mitigate training bias.
arXiv Detail & Related papers (2024-10-29T19:28:41Z) - EPS-MoE: Expert Pipeline Scheduler for Cost-Efficient MoE Inference [49.94169109038806]
This paper introduces EPS-MoE, a novel expert pipeline scheduler for MoE that surpasses the existing parallelism schemes.<n>Our results demonstrate at most 52.4% improvement in prefill throughput compared to existing parallel inference methods.
arXiv Detail & Related papers (2024-10-16T05:17:49Z) - M$^{2}$M: Learning controllable Multi of experts and multi-scale operators are the Partial Differential Equations need [43.534771810528305]
This paper introduces a framework of multi-scale and multi-expert (M$2$M) neural operators to simulate and learn PDEs efficiently.
We employ a divide-and-conquer strategy to train a multi-expert gated network for the dynamic router policy.
Our method incorporates a controllable prior gating mechanism that determines the selection rights of experts, enhancing the model's efficiency.
arXiv Detail & Related papers (2024-10-01T15:42:09Z) - Train Once, Deploy Anywhere: Matryoshka Representation Learning for Multimodal Recommendation [27.243116376164906]
We introduce a lightweight framework called full-scale Matryoshka representation learning for multimodal recommendation (fMRLRec)
Our fMRLRec captures item features at different granularities, learning informative representations for efficient recommendation across multiple dimensions.
We demonstrate the effectiveness and efficiency of fMRLRec on multiple benchmark datasets.
arXiv Detail & Related papers (2024-09-25T05:12:07Z) - Mirror Gradient: Towards Robust Multimodal Recommender Systems via
Exploring Flat Local Minima [54.06000767038741]
We analyze multimodal recommender systems from the novel perspective of flat local minima.
We propose a concise yet effective gradient strategy called Mirror Gradient (MG)
We find that the proposed MG can complement existing robust training methods and be easily extended to diverse advanced recommendation models.
arXiv Detail & Related papers (2024-02-17T12:27:30Z) - Frequency-aware Graph Signal Processing for Collaborative Filtering [26.317108637430664]
We propose a frequency-aware graph signal processing method (FaGSP) for collaborative filtering.
Firstly, we design a Cascaded Filter Module, consisting of an ideal high-pass filter and an ideal low-pass filter.
Then, we devise a Parallel Filter Module, consisting of two low-pass filters that can easily capture the hierarchy of neighborhood.
arXiv Detail & Related papers (2024-02-13T12:53:18Z) - Neural Graph Collaborative Filtering Using Variational Inference [19.80976833118502]
We introduce variational embedding collaborative filtering (GVECF) as a novel framework to incorporate representations learned through a variational graph auto-encoder.
Our proposed method achieves up to 13.78% improvement in the recall over the test data.
arXiv Detail & Related papers (2023-11-20T15:01:33Z) - A Multi-Head Ensemble Multi-Task Learning Approach for Dynamical
Computation Offloading [62.34538208323411]
We propose a multi-head ensemble multi-task learning (MEMTL) approach with a shared backbone and multiple prediction heads (PHs)
MEMTL outperforms benchmark methods in both the inference accuracy and mean square error without requiring additional training data.
arXiv Detail & Related papers (2023-09-02T11:01:16Z) - Efficient Multimodal Fusion via Interactive Prompting [62.08292938484994]
Large-scale pre-training has brought unimodal fields such as computer vision and natural language processing to a new era.
We propose an efficient and flexible multimodal fusion method, namely PMF, tailored for fusing unimodally pre-trained transformers.
arXiv Detail & Related papers (2023-04-13T07:31:51Z) - Learning-Based Adaptive User Selection in Millimeter Wave Hybrid
Beamforming Systems [5.657669046936923]
We consider a multi-user hybrid beamforming system, where the multiplexing gain is limited by the small number of chains employed at the base station (BS)
To allow greater freedom for maximizing the multiplexing gain, it is better if the BS selects and serves some of the users at each scheduling instant, rather than serving all the users all the time.
We propose a machine learning (ML)-based user selection algorithm to provide an efficient trade-off between the PF performance and the time.
arXiv Detail & Related papers (2023-02-16T11:46:36Z) - Broad Recommender System: An Efficient Nonlinear Collaborative Filtering
Approach [56.12815715932561]
We propose a new broad recommender system called Broad Collaborative Filtering (BroadCF)
Instead of Deep Neural Networks (DNNs), Broad Learning System (BLS) is used as a mapping function to learn the complex nonlinear relationships between users and items.
Extensive experiments conducted on seven benchmark datasets have confirmed the effectiveness of the proposed BroadCF algorithm.
arXiv Detail & Related papers (2022-04-20T01:25:08Z) - Dynamic Multimodal Fusion [8.530680502975095]
Dynamic multimodal fusion (DynMM) is a new approach that adaptively fuses multimodal data and generates data-dependent forward paths during inference.
Results on various multimodal tasks demonstrate the efficiency and wide applicability of our approach.
arXiv Detail & Related papers (2022-03-31T21:35:13Z) - Sparse Fusion for Multimodal Transformers [7.98117428941095]
We present Sparse Fusion Transformers (SFT), a novel multimodal fusion method for transformers.
Key to our idea is a sparse-pooling block that reduces unimodal token sets prior to cross-modality modeling.
State-of-the-art performance is obtained on multiple benchmarks under similar experiment conditions, while reporting up to six-fold reduction in computational cost and memory requirements.
arXiv Detail & Related papers (2021-11-23T16:43:49Z) - Fast Variational AutoEncoder with Inverted Multi-Index for Collaborative
Filtering [59.349057602266]
Variational AutoEncoder (VAE) has been extended as a representative nonlinear method for collaborative filtering.
We propose to decompose the inner-product-based softmax probability based on the inverted multi-index.
FastVAE can outperform the state-of-the-art baselines in terms of both sampling quality and efficiency.
arXiv Detail & Related papers (2021-09-13T08:31:59Z) - Efficient Model-Based Multi-Agent Mean-Field Reinforcement Learning [89.31889875864599]
We propose an efficient model-based reinforcement learning algorithm for learning in multi-agent systems.
Our main theoretical contributions are the first general regret bounds for model-based reinforcement learning for MFC.
We provide a practical parametrization of the core optimization problem.
arXiv Detail & Related papers (2021-07-08T18:01:02Z) - JUMBO: Scalable Multi-task Bayesian Optimization using Offline Data [86.8949732640035]
We propose JUMBO, an MBO algorithm that sidesteps limitations by querying additional data.
We show that it achieves no-regret under conditions analogous to GP-UCB.
Empirically, we demonstrate significant performance improvements over existing approaches on two real-world optimization problems.
arXiv Detail & Related papers (2021-06-02T05:03:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.