Leveraging Large Language Models for Multimodal Search
- URL: http://arxiv.org/abs/2404.15790v1
- Date: Wed, 24 Apr 2024 10:30:42 GMT
- Title: Leveraging Large Language Models for Multimodal Search
- Authors: Oriol Barbany, Michael Huang, Xinliang Zhu, Arnab Dhua,
- Abstract summary: This paper introduces a novel multimodal search model that achieves a new performance milestone on the Fashion200K dataset.
We also propose a novel search interface integrating Large Language Models (LLMs) to facilitate natural language interaction.
- Score: 0.6249768559720121
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multimodal search has become increasingly important in providing users with a natural and effective way to ex-press their search intentions. Images offer fine-grained details of the desired products, while text allows for easily incorporating search modifications. However, some existing multimodal search systems are unreliable and fail to address simple queries. The problem becomes harder with the large variability of natural language text queries, which may contain ambiguous, implicit, and irrelevant in-formation. Addressing these issues may require systems with enhanced matching capabilities, reasoning abilities, and context-aware query parsing and rewriting. This paper introduces a novel multimodal search model that achieves a new performance milestone on the Fashion200K dataset. Additionally, we propose a novel search interface integrating Large Language Models (LLMs) to facilitate natural language interaction. This interface routes queries to search systems while conversationally engaging with users and considering previous searches. When coupled with our multimodal search model, it heralds a new era of shopping assistants capable of offering human-like interaction and enhancing the overall search experience.
Related papers
- CodeXEmbed: A Generalist Embedding Model Family for Multiligual and Multi-task Code Retrieval [103.116634967815]
We introduce CodeXEmbed, a family of large-scale code embedding models ranging from 400M to 7B parameters.
Our novel training pipeline unifies multiple programming languages and transforms various code-related tasks into a common retrieval framework.
Our 7B model sets a new state-of-the-art (SOTA) in code retrieval, outperforming the previous leading model, Voyage-Code, by over 20% on CoIR benchmark.
arXiv Detail & Related papers (2024-11-19T16:54:45Z) - Leveraging LLMs to Enable Natural Language Search on Go-to-market Platforms [0.23301643766310368]
We implement and evaluate a solution for the Zoominfo product for sellers, which prompts the Large Language Models with natural language.
The intermediary search fields offer numerous advantages for each query, including the elimination of syntax errors.
Comprehensive experiments with closed, open source, and fine-tuned LLM models were conducted to demonstrate the efficacy of our approach.
arXiv Detail & Related papers (2024-11-07T03:58:38Z) - MM-Embed: Universal Multimodal Retrieval with Multimodal LLMs [78.5013630951288]
This paper introduces techniques for advancing information retrieval with multimodal large language models (MLLMs)
We first study fine-tuning an MLLM as a bi-encoder retriever on 10 datasets with 16 retrieval tasks.
We propose modality-aware hard negative mining to mitigate the modality bias exhibited by MLLM retrievers.
arXiv Detail & Related papers (2024-11-04T20:06:34Z) - Designing Interfaces for Multimodal Vector Search Applications [0.08192907805418582]
Multimodal vector search offers a new paradigm for information retrieval by exposing numerous pieces of functionality which are not possible in traditional lexical search engines.
We present implementations and design patterns which better allow users to express their information needs and effectively interact with these systems.
arXiv Detail & Related papers (2024-09-18T01:23:26Z) - Smart Multi-Modal Search: Contextual Sparse and Dense Embedding Integration in Adobe Express [3.8973445113342433]
Building a scalable multi-modal search system requires fine-tuning several components.
We address considerations such as embedding model selection, the roles of embeddings in matching and ranking, and the balance between dense and sparse embeddings.
arXiv Detail & Related papers (2024-08-26T23:52:27Z) - Query-oriented Data Augmentation for Session Search [71.84678750612754]
We propose query-oriented data augmentation to enrich search logs and empower the modeling.
We generate supplemental training pairs by altering the most important part of a search context.
We develop several strategies to alter the current query, resulting in new training data with varying degrees of difficulty.
arXiv Detail & Related papers (2024-07-04T08:08:33Z) - ACE: A Generative Cross-Modal Retrieval Framework with Coarse-To-Fine Semantic Modeling [53.97609687516371]
We propose a pioneering generAtive Cross-modal rEtrieval framework (ACE) for end-to-end cross-modal retrieval.
ACE achieves state-of-the-art performance in cross-modal retrieval and outperforms the strong baselines on Recall@1 by 15.27% on average.
arXiv Detail & Related papers (2024-06-25T12:47:04Z) - Large Search Model: Redefining Search Stack in the Era of LLMs [63.503320030117145]
We introduce a novel conceptual framework called large search model, which redefines the conventional search stack by unifying search tasks with one large language model (LLM)
All tasks are formulated as autoregressive text generation problems, allowing for the customization of tasks through the use of natural language prompts.
This proposed framework capitalizes on the strong language understanding and reasoning capabilities of LLMs, offering the potential to enhance search result quality while simultaneously simplifying the existing cumbersome search stack.
arXiv Detail & Related papers (2023-10-23T05:52:09Z) - Align before Search: Aligning Ads Image to Text for Accurate Cross-Modal
Sponsored Search [27.42717207107]
Cross-Modal sponsored search displays multi-modal advertisements (ads) when consumers look for desired products by natural language queries in search engines.
The ability to align ads-specific information in both images and texts is crucial for accurate and flexible sponsored search.
We propose a simple alignment network for explicitly mapping fine-grained visual parts in ads images to the corresponding text.
arXiv Detail & Related papers (2023-09-28T03:43:57Z) - RF-Next: Efficient Receptive Field Search for Convolutional Neural
Networks [86.6139619721343]
We propose to find better receptive field combinations through a global-to-local search scheme.
Our search scheme exploits both global search to find the coarse combinations and local search to get the refined receptive field combinations.
Our RF-Next models, plugging receptive field search to various models, boost the performance on many tasks.
arXiv Detail & Related papers (2022-06-14T06:56:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.