ArcBERT: An LLM-based Search Engine for Exploring Integrated Multi-Omics Metadata
- URL: http://arxiv.org/abs/2512.15365v1
- Date: Wed, 17 Dec 2025 12:11:14 GMT
- Title: ArcBERT: An LLM-based Search Engine for Exploring Integrated Multi-Omics Metadata
- Authors: Gajendra Doniparthi, Shashank Balu Pandhare, Stefan Deßloch, Timo Mühlhaus,
- Abstract summary: ArcBERT understands natural language queries and relies on semantic matching, unlike traditional search applications.<n>ArcBERT also understands the structure and hierarchies within the metadata, enabling it to handle diverse user querying patterns effectively.
- Score: 0.4077787659104315
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Traditional search applications within Research Data Management (RDM) ecosystems are crucial in helping users discover and explore the structured metadata from the research datasets. Typically, text search engines require users to submit keyword-based queries rather than using natural language. However, using Large Language Models (LLMs) trained on domain-specific content for specialized natural language processing (NLP) tasks is becoming increasingly common. We present ArcBERT, an LLM-based system designed for integrated metadata exploration. ArcBERT understands natural language queries and relies on semantic matching, unlike traditional search applications. Notably, ArcBERT also understands the structure and hierarchies within the metadata, enabling it to handle diverse user querying patterns effectively.
Related papers
- LLM-based Semantic Search for Conversational Queries in E-commerce [1.3645712130536118]
We present an LLM-based semantic search framework that captures user intent from conversational queries.<n>Our framework achieves strong precision and recall across various settings compared to baseline approaches on a real-world dataset.
arXiv Detail & Related papers (2026-01-23T06:35:28Z) - DeepMMSearch-R1: Empowering Multimodal LLMs in Multimodal Web Search [61.77858432092777]
We present DeepMMSearch-R1, the first multimodal large language model capable of performing on-demand, multi-turn web searches.<n>DeepMMSearch-R1 can initiate web searches based on relevant crops of the input image making the image search more effective.<n>We conduct extensive experiments across a range of knowledge-intensive benchmarks to demonstrate the superiority of our approach.
arXiv Detail & Related papers (2025-10-14T17:59:58Z) - Keywords are not always the key: A metadata field analysis for natural language search on open data portals [3.974422712382188]
We examine how individual metadata fields affect the success of conversational dataset retrieval.<n>We compare existing content of the metadata field 'description' with LLM-generated content.<n>Our findings suggest that dataset descriptions play a central role in aligning with user intent.
arXiv Detail & Related papers (2025-09-17T22:14:27Z) - Large Language Models are Good Relational Learners [55.40941576497973]
We introduce Rel-LLM, a novel architecture that utilizes a graph neural network (GNN)- based encoder to generate structured relational prompts for large language models (LLMs)<n>Unlike traditional text-based serialization approaches, our method preserves the inherent relational structure of databases while enabling LLMs to process and reason over complex entity relationships.
arXiv Detail & Related papers (2025-06-06T04:07:55Z) - Harmonizing Metadata of Language Resources for Enhanced Querying and Accessibility [0.0]
This paper addresses the harmonization of metadata from diverse repositories of language resources (LRs)<n>Our methodology supports text-based search, faceted browsing, and advanced SPARQL queries through Linghub, a newly developed portal.<n>The study highlights significant metadata issues and advocates for adherence to open vocabularies and standards to enhance metadata harmonization.
arXiv Detail & Related papers (2025-01-09T22:48:43Z) - Leveraging LLMs to Enable Natural Language Search on Go-to-market Platforms [0.23301643766310368]
We implement and evaluate a solution for the Zoominfo product for sellers, which prompts the Large Language Models with natural language.
The intermediary search fields offer numerous advantages for each query, including the elimination of syntax errors.
Comprehensive experiments with closed, open source, and fine-tuned LLM models were conducted to demonstrate the efficacy of our approach.
arXiv Detail & Related papers (2024-11-07T03:58:38Z) - Knowledge-Aware Query Expansion with Large Language Models for Textual and Relational Retrieval [49.42043077545341]
We propose a knowledge-aware query expansion framework, augmenting LLMs with structured document relations from knowledge graph (KG)<n>We leverage document texts as rich KG node representations and use document-based relation filtering for our Knowledge-Aware Retrieval (KAR)
arXiv Detail & Related papers (2024-10-17T17:03:23Z) - UQE: A Query Engine for Unstructured Databases [71.49289088592842]
We investigate the potential of Large Language Models to enable unstructured data analytics.
We propose a new Universal Query Engine (UQE) that directly interrogates and draws insights from unstructured data collections.
arXiv Detail & Related papers (2024-06-23T06:58:55Z) - Redefining Information Retrieval of Structured Database via Large Language Models [10.117751707641416]
This paper introduces a novel retrieval augmentation framework called ChatLR.
It primarily employs the powerful semantic understanding ability of Large Language Models (LLMs) as retrievers to achieve precise and concise information retrieval.
Experimental results demonstrate the effectiveness of ChatLR in addressing user queries, achieving an overall information retrieval accuracy exceeding 98.8%.
arXiv Detail & Related papers (2024-05-09T02:37:53Z) - STaRK: Benchmarking LLM Retrieval on Textual and Relational Knowledge Bases [93.96463520716759]
We develop STARK, a large-scale Semi-structure retrieval benchmark on Textual and Knowledge Bases.
Our benchmark covers three domains: product search, academic paper search, and queries in precision medicine.
We design a novel pipeline to synthesize realistic user queries that integrate diverse relational information and complex textual properties.
arXiv Detail & Related papers (2024-04-19T22:54:54Z) - Synergistic Interplay between Search and Large Language Models for
Information Retrieval [141.18083677333848]
InteR allows RMs to expand knowledge in queries using LLM-generated knowledge collections.
InteR achieves overall superior zero-shot retrieval performance compared to state-of-the-art methods.
arXiv Detail & Related papers (2023-05-12T11:58:15Z) - Query Understanding for Natural Language Enterprise Search [0.7363840001905632]
Natural Language Search (NLS) extends the capabilities of search engines that perform keyword search allowing users to issue queries in a more "natural" language.
We present an NLS system we implemented as part of the Search service of a major CRM platform.
arXiv Detail & Related papers (2020-12-11T10:57:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.