Leveraging Large Language Models for Multimodal Search
- URL: http://arxiv.org/abs/2404.15790v1
- Date: Wed, 24 Apr 2024 10:30:42 GMT
- Title: Leveraging Large Language Models for Multimodal Search
- Authors: Oriol Barbany, Michael Huang, Xinliang Zhu, Arnab Dhua,
- Abstract summary: This paper introduces a novel multimodal search model that achieves a new performance milestone on the Fashion200K dataset.
We also propose a novel search interface integrating Large Language Models (LLMs) to facilitate natural language interaction.
- Score: 0.6249768559720121
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multimodal search has become increasingly important in providing users with a natural and effective way to ex-press their search intentions. Images offer fine-grained details of the desired products, while text allows for easily incorporating search modifications. However, some existing multimodal search systems are unreliable and fail to address simple queries. The problem becomes harder with the large variability of natural language text queries, which may contain ambiguous, implicit, and irrelevant in-formation. Addressing these issues may require systems with enhanced matching capabilities, reasoning abilities, and context-aware query parsing and rewriting. This paper introduces a novel multimodal search model that achieves a new performance milestone on the Fashion200K dataset. Additionally, we propose a novel search interface integrating Large Language Models (LLMs) to facilitate natural language interaction. This interface routes queries to search systems while conversationally engaging with users and considering previous searches. When coupled with our multimodal search model, it heralds a new era of shopping assistants capable of offering human-like interaction and enhancing the overall search experience.
Related papers
- MM-Embed: Universal Multimodal Retrieval with Multimodal LLMs [78.5013630951288]
This paper introduces techniques for advancing information retrieval with multimodal large language models (MLLMs)
We first study fine-tuning an MLLM as a bi-encoder retriever on 10 datasets with 16 retrieval tasks.
We propose modality-aware hard negative mining to mitigate the modality bias exhibited by MLLM retrievers.
arXiv Detail & Related papers (2024-11-04T20:06:34Z) - A Survey of Conversational Search [44.09402706387407]
We explore the recent advancements and potential future directions in conversational search.
We highlight the integration of large language models (LLMs) in enhancing these systems.
We provide insights into real-world applications and robust evaluations of current conversational search systems.
arXiv Detail & Related papers (2024-10-21T01:54:46Z) - Smart Multi-Modal Search: Contextual Sparse and Dense Embedding Integration in Adobe Express [3.8973445113342433]
Building a scalable multi-modal search system requires fine-tuning several components.
We address considerations such as embedding model selection, the roles of embeddings in matching and ranking, and the balance between dense and sparse embeddings.
arXiv Detail & Related papers (2024-08-26T23:52:27Z) - Query-oriented Data Augmentation for Session Search [71.84678750612754]
We propose query-oriented data augmentation to enrich search logs and empower the modeling.
We generate supplemental training pairs by altering the most important part of a search context.
We develop several strategies to alter the current query, resulting in new training data with varying degrees of difficulty.
arXiv Detail & Related papers (2024-07-04T08:08:33Z) - An Interactive Query Generation Assistant using LLM-based Prompt
Modification and User Feedback [9.461978375200102]
The proposed interface is a novel search interface which supports automatic and interactive query generation over a mono-linguial or multi-lingual document collection.
The interface enables the users to refine the queries generated by different LLMs, to provide feedback on the retrieved documents or passages, and is able to incorporate the users' feedback as prompts to generate more effective queries.
arXiv Detail & Related papers (2023-11-19T04:42:24Z) - Large Search Model: Redefining Search Stack in the Era of LLMs [63.503320030117145]
We introduce a novel conceptual framework called large search model, which redefines the conventional search stack by unifying search tasks with one large language model (LLM)
All tasks are formulated as autoregressive text generation problems, allowing for the customization of tasks through the use of natural language prompts.
This proposed framework capitalizes on the strong language understanding and reasoning capabilities of LLMs, offering the potential to enhance search result quality while simultaneously simplifying the existing cumbersome search stack.
arXiv Detail & Related papers (2023-10-23T05:52:09Z) - Align before Search: Aligning Ads Image to Text for Accurate Cross-Modal
Sponsored Search [27.42717207107]
Cross-Modal sponsored search displays multi-modal advertisements (ads) when consumers look for desired products by natural language queries in search engines.
The ability to align ads-specific information in both images and texts is crucial for accurate and flexible sponsored search.
We propose a simple alignment network for explicitly mapping fine-grained visual parts in ads images to the corresponding text.
arXiv Detail & Related papers (2023-09-28T03:43:57Z) - SSP: Self-Supervised Post-training for Conversational Search [63.28684982954115]
We propose fullmodel (model) which is a new post-training paradigm with three self-supervised tasks to efficiently initialize the conversational search model.
To verify the effectiveness of our proposed method, we apply the conversational encoder post-trained by model on the conversational search task using two benchmark datasets: CAsT-19 and CAsT-20.
arXiv Detail & Related papers (2023-07-02T13:36:36Z) - RF-Next: Efficient Receptive Field Search for Convolutional Neural
Networks [86.6139619721343]
We propose to find better receptive field combinations through a global-to-local search scheme.
Our search scheme exploits both global search to find the coarse combinations and local search to get the refined receptive field combinations.
Our RF-Next models, plugging receptive field search to various models, boost the performance on many tasks.
arXiv Detail & Related papers (2022-06-14T06:56:26Z) - A Multi-Perspective Architecture for Semantic Code Search [58.73778219645548]
We propose a novel multi-perspective cross-lingual neural framework for code--text matching.
Our experiments on the CoNaLa dataset show that our proposed model yields better performance than previous approaches.
arXiv Detail & Related papers (2020-05-06T04:46:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.