Related papers: HyST: LLM-Powered Hybrid Retrieval over Semi-Structured Tabular Data

HyST: LLM-Powered Hybrid Retrieval over Semi-Structured Tabular Data

URL: http://arxiv.org/abs/2508.18048v1
Date: Mon, 25 Aug 2025 14:06:27 GMT
Title: HyST: LLM-Powered Hybrid Retrieval over Semi-Structured Tabular Data
Authors: Jiyoon Myung, Jihyeon Park, Joohyung Han,
Abstract summary: HyST (Hybrid retrieval over Semi-structured Tabular data) is a hybrid retrieval framework that combines structured filtering with semantic embedding search.<n>We show that HyST consistently outperforms tradtional baselines on a semi-structured benchmark.
Score: 0.4779196219827507
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: User queries in real-world recommendation systems often combine structured constraints (e.g., category, attributes) with unstructured preferences (e.g., product descriptions or reviews). We introduce HyST (Hybrid retrieval over Semi-structured Tabular data), a hybrid retrieval framework that combines LLM-powered structured filtering with semantic embedding search to support complex information needs over semi-structured tabular data. HyST extracts attribute-level constraints from natural language using large language models (LLMs) and applies them as metadata filters, while processing the remaining unstructured query components via embedding-based retrieval. Experiments on a semi-structured benchmark show that HyST consistently outperforms tradtional baselines, highlighting the importance of structured filtering in improving retrieval precision, offering a scalable and accurate solution for real-world user queries.

Related papers

LLM-based Semantic Search for Conversational Queries in E-commerce [1.3645712130536118]
We present an LLM-based semantic search framework that captures user intent from conversational queries.<n>Our framework achieves strong precision and recall across various settings compared to baseline approaches on a real-world dataset.
arXiv Detail & Related papers (2026-01-23T06:35:28Z)
CGPT: Cluster-Guided Partial Tables with LLM-Generated Supervision for Table Retrieval [1.483000637348699]
We introduce CGPT, a training framework that enhances table retrieval through LLM-generated supervision.<n>CGPT consistently outperforms retrieval baselines, including QGpT, with an average R@1 improvement of 16.54 percent.<n>Results indicate that semantically guided partial-table construction, combined with contrastive training from LLM-generated supervision, provides an effective and scalable paradigm for large-scale table retrieval.
arXiv Detail & Related papers (2026-01-22T10:58:56Z)
REaR: Retrieve, Expand and Refine for Effective Multitable Retrieval [46.38349148493421]
REAR (Retrieve, Expand and Refine) is a three-stage framework for efficient, high-fidelity multi-table retrieval.<n>Rear retrieves query-aligned tables, expands these with structurally joinable tables, and refines them by pruning noisy or weakly related candidates.<n>Rear is retriever-agnostic and consistently improves dense/sparse retrievers on complex table QA datasets.
arXiv Detail & Related papers (2025-11-02T05:01:04Z)
Reasoning-enhanced Query Understanding through Decomposition and Interpretation [87.56450566014625]
ReDI is a Reasoning-enhanced approach for query understanding through Decomposition and Interpretation.<n>We compiled a large-scale dataset of real-world complex queries from a major search engine.<n> Experiments on BRIGHT and BEIR demonstrate that ReDI consistently surpasses strong baselines in both sparse and dense retrieval paradigms.
arXiv Detail & Related papers (2025-09-08T10:58:42Z)
LLM-Symbolic Integration for Robust Temporal Tabular Reasoning [69.27153114778748]
We introduce TempTabQA-C, a synthetic dataset designed for systematic and controlled evaluations.<n>This structured approach allows Large Language Models (LLMs) to generate and executesql queries, enhancing generalization and mitigating biases.
arXiv Detail & Related papers (2025-06-06T05:14:04Z)
Large Language Models are Good Relational Learners [55.40941576497973]
We introduce Rel-LLM, a novel architecture that utilizes a graph neural network (GNN)- based encoder to generate structured relational prompts for large language models (LLMs)<n>Unlike traditional text-based serialization approaches, our method preserves the inherent relational structure of databases while enabling LLMs to process and reason over complex entity relationships.
arXiv Detail & Related papers (2025-06-06T04:07:55Z)
Mixture-of-RAG: Integrating Text and Tables with Large Language Models [5.038576104344948]
Heterogeneous Document RAG requires joint retrieval and reasoning across textual and hierarchical data.<n>We propose MixRAG, a novel three-stage framework that preserves hierarchical structure and heterogeneous relationships.<n>Experiments show that MixRAG boosts top-1 retrieval by 46% over strong text-only, table-only, and naive-mixture baselines.
arXiv Detail & Related papers (2025-04-13T13:02:33Z)
RoundTable: Leveraging Dynamic Schema and Contextual Autocomplete for Enhanced Query Precision in Tabular Question Answering [11.214912072391108]
Real-world datasets often feature a vast array of attributes and complex values. Traditional methods cannot fully relay the datasets size and complexity to the Large Language Models. We propose a novel framework that leverages Full-Text Search (FTS) on the input table.
arXiv Detail & Related papers (2024-08-22T13:13:06Z)
UQE: A Query Engine for Unstructured Databases [71.49289088592842]
We investigate the potential of Large Language Models to enable unstructured data analytics. We propose a new Universal Query Engine (UQE) that directly interrogates and draws insights from unstructured data collections.
arXiv Detail & Related papers (2024-06-23T06:58:55Z)
STaRK: Benchmarking LLM Retrieval on Textual and Relational Knowledge Bases [93.96463520716759]
We develop STARK, a large-scale Semi-structure retrieval benchmark on Textual and Knowledge Bases. Our benchmark covers three domains: product search, academic paper search, and queries in precision medicine. We design a novel pipeline to synthesize realistic user queries that integrate diverse relational information and complex textual properties.
arXiv Detail & Related papers (2024-04-19T22:54:54Z)
Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity [59.57065228857247]
Retrieval-augmented Large Language Models (LLMs) have emerged as a promising approach to enhancing response accuracy in several tasks, such as Question-Answering (QA) We propose a novel adaptive QA framework, that can dynamically select the most suitable strategy for (retrieval-augmented) LLMs based on the query complexity. We validate our model on a set of open-domain QA datasets, covering multiple query complexities, and show that ours enhances the overall efficiency and accuracy of QA systems.
arXiv Detail & Related papers (2024-03-21T13:52:30Z)
Beyond Extraction: Contextualising Tabular Data for Efficient Summarisation by Language Models [0.0]
The conventional use of the Retrieval-Augmented Generation architecture has proven effective for retrieving information from diverse documents. This research introduces an innovative approach to enhance the accuracy of complex table queries in RAG-based systems.
arXiv Detail & Related papers (2024-01-04T16:16:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.