The 1st EReL@MIR Workshop on Efficient Representation Learning for Multimodal Information Retrieval
- URL: http://arxiv.org/abs/2504.14788v1
- Date: Mon, 21 Apr 2025 01:10:59 GMT
- Title: The 1st EReL@MIR Workshop on Efficient Representation Learning for Multimodal Information Retrieval
- Authors: Junchen Fu, Xuri Ge, Xin Xin, Haitao Yu, Yue Feng, Alexandros Karatzoglou, Ioannis Arapakis, Joemon M. Jose,
- Abstract summary: We propose the first EReL@MIR workshop at the Web Conference 2025, inviting participants to explore novel solutions.<n>This workshop aims to provide a platform for both academic and industry researchers to engage in discussions, share insights, and foster collaboration.
- Score: 49.587042083937426
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multimodal representation learning has garnered significant attention in the AI community, largely due to the success of large pre-trained multimodal foundation models like LLaMA, GPT, Mistral, and CLIP. These models have achieved remarkable performance across various tasks of multimodal information retrieval (MIR), including web search, cross-modal retrieval, and recommender systems, etc. However, due to their enormous parameter sizes, significant efficiency challenges emerge across training, deployment, and inference stages when adapting these models' representation for IR tasks. These challenges present substantial obstacles to the practical adaptation of foundation models for representation learning in information retrieval tasks. To address these pressing issues, we propose organizing the first EReL@MIR workshop at the Web Conference 2025, inviting participants to explore novel solutions, emerging problems, challenges, efficiency evaluation metrics and benchmarks. This workshop aims to provide a platform for both academic and industry researchers to engage in discussions, share insights, and foster collaboration toward achieving efficient and effective representation learning for multimodal information retrieval in the era of large foundation models.
Related papers
- Feature Fusion Revisited: Multimodal CTR Prediction for MMCTR Challenge [4.3058911704400415]
The EReL@MIR workshop provided a valuable opportunity to experiment with various approaches aimed at improving the efficiency of multimodal representation learning.
Our team was honored to receive the award for Task 2 - Winner (Multimodal CTR Prediction)
arXiv Detail & Related papers (2025-04-26T16:04:33Z) - Why Reasoning Matters? A Survey of Advancements in Multimodal Reasoning (v1) [66.51642638034822]
Reasoning is central to human intelligence, enabling structured problem-solving across diverse tasks.
Recent advances in large language models (LLMs) have greatly enhanced their reasoning abilities in arithmetic, commonsense, and symbolic domains.
This paper offers a concise yet insightful overview of reasoning techniques in both textual and multimodal LLMs.
arXiv Detail & Related papers (2025-04-04T04:04:56Z) - A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications [7.414857515253022]
We introduce the basic design of MoE, including gating functions, expert networks, routing mechanisms, training strategies, and system design.<n>We then explore the algorithm design of MoE in important machine learning paradigms such as continual learning, meta-learning, multi-task learning, and reinforcement learning.
arXiv Detail & Related papers (2025-03-10T10:08:55Z) - LamRA: Large Multimodal Model as Your Advanced Retrieval Assistant [63.28378110792787]
We introduce LamRA, a versatile framework designed to empower Large Multimodal Models with sophisticated retrieval and reranking capabilities.<n>For retrieval, we adopt a two-stage training strategy comprising language-only pre-training and multimodal instruction tuning.<n>For reranking, we employ joint training for both pointwise and listwise reranking, offering two distinct ways to further boost the retrieval performance.
arXiv Detail & Related papers (2024-12-02T17:10:16Z) - RA-BLIP: Multimodal Adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training [55.54020926284334]
Multimodal Large Language Models (MLLMs) have recently received substantial interest, which shows their emerging potential as general-purpose models for various vision-language tasks.
Retrieval augmentation techniques have proven to be effective plugins for both LLMs and MLLMs.
In this study, we propose multimodal adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training (RA-BLIP), a novel retrieval-augmented framework for various MLLMs.
arXiv Detail & Related papers (2024-10-18T03:45:19Z) - HEMM: Holistic Evaluation of Multimodal Foundation Models [91.60364024897653]
Multimodal foundation models can holistically process text alongside images, video, audio, and other sensory modalities.
It is challenging to characterize and study progress in multimodal foundation models, given the range of possible modeling decisions, tasks, and domains.
arXiv Detail & Related papers (2024-07-03T18:00:48Z) - NoteLLM-2: Multimodal Large Representation Models for Recommendation [71.87790090964734]
Large Language Models (LLMs) have demonstrated exceptional proficiency in text understanding and embedding tasks.<n>Their potential in multimodal representation, particularly for item-to-item (I2I) recommendations, remains underexplored.<n>We propose an end-to-end fine-tuning method that customizes the integration of any existing LLMs and vision encoders for efficient multimodal representation.
arXiv Detail & Related papers (2024-05-27T03:24:01Z) - A Survey on Efficient Inference for Large Language Models [25.572035747669275]
Large Language Models (LLMs) have attracted extensive attention due to their remarkable performance across various tasks.
The substantial computational and memory requirements of LLM inference pose challenges for deployment in resource-constrained scenarios.
This paper presents a comprehensive survey of the existing literature on efficient LLM inference.
arXiv Detail & Related papers (2024-04-22T15:53:08Z) - Attribution Regularization for Multimodal Paradigms [7.1262539590168705]
Multimodal machine learning can integrate information from multiple modalities to enhance learning and decision-making processes.
It is commonly observed that unimodal models outperform multimodal models, despite the latter having access to richer information.
This research project proposes a novel regularization term that encourages multimodal models to effectively utilize information from all modalities when making decisions.
arXiv Detail & Related papers (2024-04-02T23:05:56Z) - Multi-Tower Multi-Interest Recommendation with User Representation Repel [0.9867914513513453]
We propose a novel multi-tower multi-interest framework with user representation repel.
Experimental results across multiple large-scale industrial datasets proved the effectiveness and generalizability of our proposed framework.
arXiv Detail & Related papers (2024-03-08T07:36:14Z) - High-Modality Multimodal Transformer: Quantifying Modality & Interaction
Heterogeneity for High-Modality Representation Learning [112.51498431119616]
This paper studies efficient representation learning for high-modality scenarios involving a large set of diverse modalities.
A single model, HighMMT, scales up to 10 modalities (text, image, audio, video, sensors, proprioception, speech, time-series, sets, and tables) and 15 tasks from 5 research areas.
arXiv Detail & Related papers (2022-03-02T18:56:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.