Related papers: FITMM: Adaptive Frequency-Aware Multimodal Recommendation via Information-Theoretic Representation Learning

FITMM: Adaptive Frequency-Aware Multimodal Recommendation via Information-Theoretic Representation Learning

URL: http://arxiv.org/abs/2601.22498v1
Date: Fri, 30 Jan 2026 03:16:54 GMT
Title: FITMM: Adaptive Frequency-Aware Multimodal Recommendation via Information-Theoretic Representation Learning
Authors: Wei Yang, Rui Zhong, Yiqun Chen, Shixuan Li, Heng Ping, Chi Lu, Peng Jiang,
Abstract summary: We propose a Frequency-aware Information-Theoretic framework for multimodal recommendation.<n> FITMM constructs graph-enhanced item representations, performs modality-wise spectral decomposition, and forms lightweight within-band multimodal components.<n>Experiments on three real-world datasets demonstrate that FITMM consistently and significantly outperforms advanced baselines.
Score: 14.873780184982003
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Multimodal recommendation aims to enhance user preference modeling by leveraging rich item content such as images and text. Yet dominant systems fuse modalities in the spatial domain, obscuring the frequency structure of signals and amplifying misalignment and redundancy. We adopt a spectral information-theoretic view and show that, under an orthogonal transform that approximately block-diagonalizes bandwise covariances, the Gaussian Information Bottleneck objective decouples across frequency bands, providing a principled basis for separate-then-fuse paradigm. Building on this foundation, we propose FITMM, a Frequency-aware Information-Theoretic framework for multimodal recommendation. FITMM constructs graph-enhanced item representations, performs modality-wise spectral decomposition to obtain orthogonal bands, and forms lightweight within-band multimodal components. A residual, task-adaptive gate aggregates bands into the final representation. To control redundancy and improve generalization, we regularize training with a frequency-domain IB term that allocates capacity across bands (Wiener-like shrinkage with shut-off of weak bands). We further introduce a cross-modal spectral consistency loss that aligns modalities within each band. The model is jointly optimized with the standard recommendation loss. Extensive experiments on three real-world datasets demonstrate that FITMM consistently and significantly outperforms advanced baselines.

Related papers

Generalized Robust Adaptive-Bandwidth Multi-View Manifold Learning in High Dimensions with Noise [19.34603871517906]
Multiview datasets are common in scientific and engineering applications, yet existing fusion methods offer limited theoretical guarantees.<n>We propose Generalized Robust Adaptive-Bandwidth Multiview Diffusion Maps (GRAB-MDM), a new kernel-based diffusion geometry framework for integrating multiple noisy data sources.
arXiv Detail & Related papers (2026-02-11T05:01:10Z)
UniDiff: A Unified Diffusion Framework for Multimodal Time Series Forecasting [90.47915032778366]
We propose UniDiff, a unified diffusion framework for multimodal time series forecasting.<n>At its core lies a unified and parallel fusion module, where a single cross-attention mechanism integrates structural information from timestamps and semantic context from texts.<n>Experiments on real-world benchmark datasets across eight domains demonstrate that the proposed UniDiff model achieves state-of-the-art performance.
arXiv Detail & Related papers (2025-12-08T05:36:14Z)
A Novel Multimodal RUL Framework for Remaining Useful Life Estimation with Layer-wise Explanations [2.312232949770907]
Rolling-element bearings are among the most frequent causes of machinery failure.<n>Rolling-element bearings are among the most frequent causes of machinery failure.<n>Existing approaches often suffer from poor generalization, lack of robustness, high data demands, and limited interpretability.
arXiv Detail & Related papers (2025-12-07T07:38:36Z)
Structured Spectral Reasoning for Frequency-Adaptive Multimodal Recommendation [13.886659472425393]
Multimodal recommendation aims to integrate collaborative signals with heterogeneous content such as visual and textual information.<n>These issues are often exacerbated by naive fusion or shallow modeling strategies, leading to degraded generalization and poor robustness.<n>We propose a Structured Spectral Reasoning framework for frequency-aware multimodal recommendation.
arXiv Detail & Related papers (2025-12-01T07:39:28Z)
FAIM: Frequency-Aware Interactive Mamba for Time Series Classification [87.84511960413715]
Time series classification (TSC) is crucial in numerous real-world applications, such as environmental monitoring, medical diagnosis, and posture recognition.<n>We propose FAIM, a lightweight Frequency-Aware Interactive Mamba model.<n>We show that FAIM consistently outperforms existing state-of-the-art (SOTA) methods, achieving a superior trade-off between accuracy and efficiency.
arXiv Detail & Related papers (2025-11-26T08:36:33Z)
Frequency-Domain Decomposition and Recomposition for Robust Audio-Visual Segmentation [60.9960601057956]
We introduce Frequency-Aware Audio-Visualcomposer (FAVS) framework consisting of two key modules.<n>FAVS framework achieves state-of-the-art performance on three benchmark datasets.
arXiv Detail & Related papers (2025-09-23T12:33:48Z)
FindRec: Stein-Guided Entropic Flow for Multi-Modal Sequential Recommendation [57.577843653775]
We propose textbfFindRec (textbfFlexible unified textbfinformation textbfdisentanglement for multi-modal sequential textbfRecommendation)<n>A Stein kernel-based Integrated Information Coordination Module (IICM) theoretically guarantees distribution consistency between multimodal features and ID streams.<n>A cross-modal expert routing mechanism that adaptively filters and combines multimodal features based on their contextual relevance.
arXiv Detail & Related papers (2025-07-07T04:09:45Z)
Frequency Domain-Based Diffusion Model for Unpaired Image Dehazing [92.61216319417208]
We propose a novel frequency domain-based diffusion model, named ours, for fully exploiting the beneficial knowledge in unpaired clear data.<n>Inspired by the strong generative ability shown by Diffusion Models (DMs), we tackle the dehazing task from the perspective of frequency domain reconstruction.
arXiv Detail & Related papers (2025-07-02T01:22:46Z)
Robust Spectral Fuzzy Clustering of Multivariate Time Series with Applications to Electroencephalogram [6.62414474989199]
We introduce a fuzzy clustering framework in the spectral domain to extract frequency-specific monotonic relationships across variables.<n>Our method takes advantage of dominant frequency-based cross-regional connectivity patterns to improve clustering accuracy.<n>As a flagship application, we analyze electroencephalogram recordings, where our approach uncovers frequency- and connectivity-specific markers of latent cognitive states.
arXiv Detail & Related papers (2025-06-28T12:02:01Z)
Denoising and Alignment: Rethinking Domain Generalization for Multimodal Face Anti-Spoofing [47.24147617685829]
Face Anti-Spoofing (FAS) is essential for the security of facial recognition systems in diverse scenarios.<n>We introduce the textbfMultitextbfmodal textbfDenoising and textbfAlignment (textbfMMDA) framework.<n>By leveraging the zero-shot generalization capability of CLIP, the MMDA framework effectively suppresses noise in multimodal data.
arXiv Detail & Related papers (2025-05-14T15:36:44Z)
Content-aware Balanced Spectrum Encoding in Masked Modeling for Time Series Classification [25.27495694566081]
We propose an auxiliary content-aware balanced decoder (CBD) to optimize the encoding quality in the spectrum space within masked modeling scheme.<n>CBD iterates on a series of fundamental blocks, and thanks to two tailored units, each block could progressively refine the masked representation.
arXiv Detail & Related papers (2024-12-17T14:12:20Z)
Accelerated Multi-Contrast MRI Reconstruction via Frequency and Spatial Mutual Learning [50.74383395813782]
We propose a novel Frequency and Spatial Mutual Learning Network (FSMNet) to explore global dependencies across different modalities. The proposed FSMNet achieves state-of-the-art performance for the Multi-Contrast MR Reconstruction task with different acceleration factors.
arXiv Detail & Related papers (2024-09-21T12:02:47Z)
Unified Frequency-Assisted Transformer Framework for Detecting and Grounding Multi-Modal Manipulation [109.1912721224697]
We present the Unified Frequency-Assisted transFormer framework, named UFAFormer, to address the DGM4 problem. By leveraging the discrete wavelet transform, we decompose images into several frequency sub-bands, capturing rich face forgery artifacts. Our proposed frequency encoder, incorporating intra-band and inter-band self-attentions, explicitly aggregates forgery features within and across diverse sub-bands.
arXiv Detail & Related papers (2023-09-18T11:06:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.