Related papers: Designing Interfaces for Multimodal Vector Search Applications

Designing Interfaces for Multimodal Vector Search Applications

URL: http://arxiv.org/abs/2409.11629v1
Date: Wed, 18 Sep 2024 01:23:26 GMT
Title: Designing Interfaces for Multimodal Vector Search Applications
Authors: Owen Pendrigh Elliott, Tom Hamer, Jesse Clark,
Abstract summary: Multimodal vector search offers a new paradigm for information retrieval by exposing numerous pieces of functionality which are not possible in traditional lexical search engines. We present implementations and design patterns which better allow users to express their information needs and effectively interact with these systems.
Score: 0.08192907805418582
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Multimodal vector search offers a new paradigm for information retrieval by exposing numerous pieces of functionality which are not possible in traditional lexical search engines. While multimodal vector search can be treated as a drop in replacement for these traditional systems, the experience can be significantly enhanced by leveraging the unique capabilities of multimodal search. Central to any information retrieval system is a user who expresses an information need, traditional user interfaces with a single search bar allow users to interact with lexical search systems effectively however are not necessarily optimal for multimodal vector search. In this paper we explore novel capabilities of multimodal vector search applications utilising CLIP models and present implementations and design patterns which better allow users to express their information needs and effectively interact with these systems in an information retrieval context.

Related papers

IDEA: Inverted Text with Cooperative Deformable Aggregation for Multi-modal Object Re-Identification [60.38841251693781]
We propose a novel framework to generate robust multi-modal object ReIDs. Our framework uses Modal Prefixes and InverseNet to integrate multi-modal information with semantic guidance from inverted text. Experiments on three multi-modal object ReID benchmarks demonstrate the effectiveness of our proposed method.
arXiv Detail & Related papers (2025-03-13T13:00:31Z)
MM-Embed: Universal Multimodal Retrieval with Multimodal LLMs [78.5013630951288]
This paper introduces techniques for advancing information retrieval with multimodal large language models (MLLMs) We first study fine-tuning an MLLM as a bi-encoder retriever on 10 datasets with 16 retrieval tasks. We propose modality-aware hard negative mining to mitigate the modality bias exhibited by MLLM retrievers.
arXiv Detail & Related papers (2024-11-04T20:06:34Z)
Smart Multi-Modal Search: Contextual Sparse and Dense Embedding Integration in Adobe Express [3.8973445113342433]
Building a scalable multi-modal search system requires fine-tuning several components. We address considerations such as embedding model selection, the roles of embeddings in matching and ranking, and the balance between dense and sparse embeddings.
arXiv Detail & Related papers (2024-08-26T23:52:27Z)
An Interactive Multi-modal Query Answering System with Retrieval-Augmented Large Language Models [21.892975397847316]
We present an interactive Multi-modal Query Answering (MQA) system, empowered by our newly developed multi-modal retrieval framework and navigation graph index. One notable aspect of MQA is its utilization of contrastive learning to assess the significance of different modalities. The system achieves efficient retrieval through our advanced navigation graph index, refined using computational pruning techniques.
arXiv Detail & Related papers (2024-07-05T02:01:49Z)
Leveraging Large Language Models for Multimodal Search [0.6249768559720121]
This paper introduces a novel multimodal search model that achieves a new performance milestone on the Fashion200K dataset. We also propose a novel search interface integrating Large Language Models (LLMs) to facilitate natural language interaction.
arXiv Detail & Related papers (2024-04-24T10:30:42Z)
DialCLIP: Empowering CLIP as Multi-Modal Dialog Retriever [83.33209603041013]
We propose a parameter-efficient prompt-tuning method named DialCLIP for multi-modal dialog retrieval. Our approach introduces a multi-modal context generator to learn context features which are distilled into prompts within the pre-trained vision-language model CLIP. To facilitate various types of retrieval, we also design multiple experts to learn mappings from CLIP outputs to multi-modal representation space.
arXiv Detail & Related papers (2024-01-02T07:40:12Z)
UniIR: Training and Benchmarking Universal Multimodal Information Retrievers [76.06249845401975]
We introduce UniIR, a unified instruction-guided multimodal retriever capable of handling eight distinct retrieval tasks across modalities. UniIR, a single retrieval system jointly trained on ten diverse multimodal-IR datasets, interprets user instructions to execute various retrieval tasks. We construct the M-BEIR, a multimodal retrieval benchmark with comprehensive results, to standardize the evaluation of universal multimodal information retrieval.
arXiv Detail & Related papers (2023-11-28T18:55:52Z)
Large Search Model: Redefining Search Stack in the Era of LLMs [63.503320030117145]
We introduce a novel conceptual framework called large search model, which redefines the conventional search stack by unifying search tasks with one large language model (LLM) All tasks are formulated as autoregressive text generation problems, allowing for the customization of tasks through the use of natural language prompts. This proposed framework capitalizes on the strong language understanding and reasoning capabilities of LLMs, offering the potential to enhance search result quality while simultaneously simplifying the existing cumbersome search stack.
arXiv Detail & Related papers (2023-10-23T05:52:09Z)
Exploring Multi-Modal Contextual Knowledge for Open-Vocabulary Object Detection [72.36017150922504]
We propose a multi-modal contextual knowledge distillation framework, MMC-Det, to transfer the learned contextual knowledge from a teacher fusion transformer to a student detector. The diverse multi-modal masked language modeling is realized by an object divergence constraint upon traditional multi-modal masked language modeling (MLM)
arXiv Detail & Related papers (2023-08-30T08:33:13Z)
Multi-Modal Few-Shot Object Detection with Meta-Learning-Based Cross-Modal Prompting [77.69172089359606]
We study multi-modal few-shot object detection (FSOD) in this paper, using both few-shot visual examples and class semantic information for detection. Our approach is motivated by the high-level conceptual similarity of (metric-based) meta-learning and prompt-based learning. We comprehensively evaluate the proposed multi-modal FSOD models on multiple few-shot object detection benchmarks, achieving promising results.
arXiv Detail & Related papers (2022-04-16T16:45:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.