Related papers: Multimodal Neural Databases

Multimodal Neural Databases

URL: http://arxiv.org/abs/2305.01447v1
Date: Tue, 2 May 2023 14:27:56 GMT
Title: Multimodal Neural Databases
Authors: Giovanni Trappolini, Andrea Santilli, Emanuele Rodol\`a, Alon Halevy, Fabrizio Silvestri
Abstract summary: We propose a new framework, which we name Multimodal Neural databases (MMNDBs) MMNDBs can answer complex database-like queries involving reasoning over different input modalities, such as text and images, at scale. We show the potential of these new techniques to process unstructured data coming from different modalities, paving the way for future research.
Score: 4.321727213494619
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: The rise in loosely-structured data available through text, images, and other modalities has called for new ways of querying them. Multimedia Information Retrieval has filled this gap and has witnessed exciting progress in recent years. Tasks such as search and retrieval of extensive multimedia archives have undergone massive performance improvements, driven to a large extent by recent developments in multimodal deep learning. However, methods in this field remain limited in the kinds of queries they support and, in particular, their inability to answer database-like queries. For this reason, inspired by recent work on neural databases, we propose a new framework, which we name Multimodal Neural Databases (MMNDBs). MMNDBs can answer complex database-like queries that involve reasoning over different input modalities, such as text and images, at scale. In this paper, we present the first architecture able to fulfill this set of requirements and test it with several baselines, showing the limitations of currently available models. The results show the potential of these new techniques to process unstructured data coming from different modalities, paving the way for future research in the area. Code to replicate the experiments will be released at https://github.com/GiovanniTRA/MultimodalNeuralDatabases

Related papers

MultiConIR: Towards multi-condition Information Retrieval [57.6405602406446]
We introduce MultiConIR, the first benchmark designed to evaluate retrieval models in multi-condition scenarios. We propose three tasks to assess retrieval and reranking models on multi-condition robustness, monotonic relevance ranking, and query format sensitivity.
arXiv Detail & Related papers (2025-03-11T05:02:03Z)
Needle: A Generative AI-Powered Multi-modal Database for Answering Complex Natural Language Queries [8.779871128906787]
Multi-modal datasets often miss the detailed descriptions that properly capture the rich information encoded in each item.<n>This makes answering complex natural language queries a major challenge in this domain.<n>We introduce a Generative-based Monte Carlo method that utilizes foundation models to generate synthetic samples.<n>Our system is open-source and ready for deployment, designed to be easily adopted by researchers and developers.
arXiv Detail & Related papers (2024-12-01T01:36:41Z)
MM-Embed: Universal Multimodal Retrieval with Multimodal LLMs [78.5013630951288]
This paper introduces techniques for advancing information retrieval with multimodal large language models (MLLMs) We first study fine-tuning an MLLM as a bi-encoder retriever on 10 datasets with 16 retrieval tasks. We propose modality-aware hard negative mining to mitigate the modality bias exhibited by MLLM retrievers.
arXiv Detail & Related papers (2024-11-04T20:06:34Z)
BabelBench: An Omni Benchmark for Code-Driven Analysis of Multimodal and Multistructured Data [61.936320820180875]
Large language models (LLMs) have become increasingly pivotal across various domains. BabelBench is an innovative benchmark framework that evaluates the proficiency of LLMs in managing multimodal multistructured data with code execution. Our experimental findings on BabelBench indicate that even cutting-edge models like ChatGPT 4 exhibit substantial room for improvement.
arXiv Detail & Related papers (2024-10-01T15:11:24Z)
Database-Augmented Query Representation for Information Retrieval [59.57065228857247]
We present a novel retrieval framework called Database-Augmented Query representation (DAQu) DAQu augments the original query with various (query-related) metadata across multiple tables. We validate DAQu in diverse retrieval scenarios that can incorporate metadata from the relational database.
arXiv Detail & Related papers (2024-06-23T05:02:21Z)
ADMUS: A Progressive Question Answering Framework Adaptable to Multiple Knowledge Sources [9.484792817869671]
We present ADMUS, a progressive knowledge base question answering framework designed to accommodate a wide variety of datasets. Our framework supports the seamless integration of new datasets with minimal effort, only requiring creating a dataset-related micro-service at a negligible cost.
arXiv Detail & Related papers (2023-08-09T08:46:39Z)
End-to-end Knowledge Retrieval with Multi-modal Queries [50.01264794081951]
ReMuQ requires a system to retrieve knowledge from a large corpus by integrating contents from both text and image queries. We introduce a retriever model ReViz'' that can directly process input text and images to retrieve relevant knowledge in an end-to-end fashion. We demonstrate superior performance in retrieval on two datasets under zero-shot settings.
arXiv Detail & Related papers (2023-06-01T08:04:12Z)
Neural Graph Reasoning: Complex Logical Query Answering Meets Graph Databases [63.96793270418793]
Complex logical query answering (CLQA) is a recently emerged task of graph machine learning. We introduce the concept of Neural Graph Database (NGDBs) NGDB consists of a Neural Graph Storage and a Neural Graph Engine.
arXiv Detail & Related papers (2023-03-26T04:03:37Z)
Semi-Structured Query Grounding for Document-Oriented Databases with Deep Retrieval and Its Application to Receipt and POI Matching [23.52046767195031]
We aim to address practical challenges when using embedding-based retrieval for the query grounding problem in semi-structured data. We conduct extensive experiments to find the most effective combination of modules for the embedding and retrieval of both query and database entries. The proposed model significantly outperforms the conventional manual pattern-based model while requiring much less development and maintenance cost.
arXiv Detail & Related papers (2022-02-23T05:32:34Z)
Database Reasoning Over Text [11.074939080454412]
We show that state-of-the-art transformer models perform very well for small databases. We propose a modular architecture to answer database-style queries over multiple spans from text. Our architecture scales to databases containing thousands of facts whereas contemporary models are limited by how many facts can be encoded.
arXiv Detail & Related papers (2021-06-02T11:09:40Z)
MELINDA: A Multimodal Dataset for Biomedical Experiment Method Classification [14.820951153262685]
We introduce a new dataset, MELINDA, for Multimodal biomEdicaL experImeNt methoD clAssification. The dataset is collected in a fully automated distant supervision manner, where the labels are obtained from an existing curated database. We benchmark various state-of-the-art NLP and computer vision models, including unimodal models which only take either caption texts or images as inputs.
arXiv Detail & Related papers (2020-12-16T19:11:36Z)
VisualSem: A High-quality Knowledge Graph for Vision and Language [48.47370435793127]
We release VisualSem: a high-quality knowledge graph (KG) VisualSem includes nodes with multilingual glosses, multiple illustrative images, and visually relevant relations. We also release a neural multi-modal retrieval model that can use images or sentences as inputs and retrieves entities in the KG.
arXiv Detail & Related papers (2020-08-20T18:20:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.