Tailoring Table Retrieval from a Field-aware Hybrid Matching Perspective
- URL: http://arxiv.org/abs/2503.02251v1
- Date: Tue, 04 Mar 2025 03:57:10 GMT
- Title: Tailoring Table Retrieval from a Field-aware Hybrid Matching Perspective
- Authors: Da Li, Keping Bi, Jiafeng Guo, Xueqi Cheng,
- Abstract summary: Table retrieval is less explored compared to text retrieval.<n>Different table fields have varying matching preferences.<n>We introduce a Table-tailored HYbrid Matching rEtriever (THYME)
- Score: 70.13748256886288
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Table retrieval, essential for accessing information through tabular data, is less explored compared to text retrieval. The row/column structure and distinct fields of tables (including titles, headers, and cells) present unique challenges. For example, different table fields have varying matching preferences: cells may favor finer-grained (word/phrase level) matching over broader (sentence/passage level) matching due to their fragmented and detailed nature, unlike titles. This necessitates a table-specific retriever to accommodate the various matching needs of each table field. Therefore, we introduce a Table-tailored HYbrid Matching rEtriever (THYME), which approaches table retrieval from a field-aware hybrid matching perspective. Empirical results on two table retrieval benchmarks, NQ-TABLES and OTT-QA, show that THYME significantly outperforms state-of-the-art baselines. Comprehensive analyses confirm the differing matching preferences across table fields and validate the design of THYME.
Related papers
- Bridging Queries and Tables through Entities in Table Retrieval [70.13748256886288]
Entities are well-studied in the context of text retrieval, but there is a noticeable lack of research on their applications in table retrieval.
We propose an entity-enhanced training framework and design an interaction paradigm based on entity representations.
Our proposed framework is plug-and-play and flexible, making it easy to integrate into existing table retriever training processes.
arXiv Detail & Related papers (2025-04-09T03:16:33Z) - Evaluation of Table Representations to Answer Questions from Tables in Documents : A Case Study using 3GPP Specifications [0.650923326742559]
The representation of a table in terms of what is a relevant chunk is not obvious.
Row level representations with corresponding table header information being included in every cell improves the performance of the retrieval.
arXiv Detail & Related papers (2024-08-30T04:40:35Z) - HYTREL: Hypergraph-enhanced Tabular Data Representation Learning [36.731257438472035]
HYTREL is a language model that captures the row/column permutation invariances and three more structural properties of tabular data.
We show that HYTREL consistently outperforms other competitive baselines on four downstream tasks with minimal pretraining.
Our qualitative analyses showcase that HYTREL can assimilate the table structures to generate robust representations for the cells, rows, columns, and the entire table.
arXiv Detail & Related papers (2023-07-14T05:41:22Z) - A large-scale dataset for end-to-end table recognition in the wild [13.717478398235055]
Table recognition (TR) is one of the research hotspots in pattern recognition.
Currently, the end-to-end TR in real scenarios, accomplishing the three sub-tasks simultaneously, is yet an unexplored research area.
We propose a new large-scale dataset named Table Recognition Set (TabRecSet) with diverse table forms sourcing from multiple scenarios in the wild.
arXiv Detail & Related papers (2023-03-27T02:48:51Z) - SEMv2: Table Separation Line Detection Based on Instance Segmentation [96.36188168694781]
We propose an accurate table structure recognizer, termed SEMv2 (SEM: Split, Embed and Merge)
We address the table separation line instance-level discrimination problem and introduce a table separation line detection strategy based on conditional convolution.
To comprehensively evaluate the SEMv2, we also present a more challenging dataset for table structure recognition, dubbed iFLYTAB.
arXiv Detail & Related papers (2023-03-08T05:15:01Z) - Table Retrieval May Not Necessitate Table-specific Model Design [83.27735758203089]
We focus on the task of table retrieval, and ask: "is table-specific model design necessary for table retrieval?"
Based on an analysis on a table-based portion of the Natural Questions dataset (NQ-table), we find that structure plays a negligible role in more than 70% of the cases.
We then experiment with three modules to explicitly encode table structures, namely auxiliary row/column embeddings, hard attention masks, and soft relation-based attention biases.
None of these yielded significant improvements, suggesting that table-specific model design may not be necessary for table retrieval.
arXiv Detail & Related papers (2022-05-19T20:35:23Z) - UniRE: A Unified Label Space for Entity Relation Extraction [67.53850477281058]
Joint entity relation extraction models setup two separated label spaces for the two sub-tasks.
We argue that this setting may hinder the information interaction between entities and relations.
In this work, we propose to eliminate the different treatment on the two sub-tasks' label spaces.
arXiv Detail & Related papers (2021-07-09T08:09:37Z) - Retrieving Complex Tables with Multi-Granular Graph Representation
Learning [20.72341939868327]
The task of natural language table retrieval seeks to retrieve semantically relevant tables based on natural language queries.
Existing learning systems treat tables as plain text based on the assumption that tables are structured as dataframes.
We propose Graph-based Table Retrieval (GTR), a generalizable NLTR framework with multi-granular graph representation learning.
arXiv Detail & Related papers (2021-05-04T20:19:03Z) - GraPPa: Grammar-Augmented Pre-Training for Table Semantic Parsing [117.98107557103877]
We present GraPPa, an effective pre-training approach for table semantic parsing.
We construct synthetic question-pairs over high-free tables via a synchronous context-free grammar.
To maintain the model's ability to represent real-world data, we also include masked language modeling.
arXiv Detail & Related papers (2020-09-29T08:17:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.