Related papers: TableQnA: Answering List Intent Queries With Web Tables

TableQnA: Answering List Intent Queries With Web Tables

URL: http://arxiv.org/abs/2001.04828v1
Date: Fri, 10 Jan 2020 01:43:54 GMT
Title: TableQnA: Answering List Intent Queries With Web Tables
Authors: Kaushik Chakrabarti, Zhimin Chen, Siamak Shakeri, Guihong Cao, Surajit Chaudhuri
Abstract summary: We focus on answering two classes of queries with HTML tables: those seeking lists of entities and those seeking superlative entities. Existing approaches train machine learning models to select the answer from the candidates. We develop novel features to compute structure-aware match and train a machine learning model.
Score: 12.941073798838167
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The web contains a vast corpus of HTML tables. They can be used to provide direct answers to many web queries. We focus on answering two classes of queries with those tables: those seeking lists of entities (e.g., `cities in california') and those seeking superlative entities (e.g., `largest city in california'). The main challenge is to achieve high precision with significant coverage. Existing approaches train machine learning models to select the answer from the candidates; they rely on textual match features between the query and the content of the table along with features capturing table quality/importance. These features alone are inadequate for achieving the above goals. Our main insight is that we can improve precision by (i) first extracting intent (structured information) from the query for the above query classes and (ii) then performing structure-aware matching (instead of just textual matching) between the extracted intent and the candidates to select the answer. We model (i) as a sequence tagging task. We leverage state-of-the-art deep neural network models with word embeddings. The model requires large scale training data which is expensive to obtain via manual labeling; we therefore develop a novel method to automatically generate the training data. For (ii), we develop novel features to compute structure-aware match and train a machine learning model. Our experiments on real-life web search queries show that (i) our intent extractor for list and superlative intent queries has significantly higher precision and coverage compared with baseline approaches and (ii) our table answer selector significantly outperforms the state-of-the-art baseline approach. This technology has been used in production by Microsoft's Bing search engine since 2016.

Related papers

CRAFT: Training-Free Cascaded Retrieval for Tabular QA [11.984180880537936]
Table Question Answering (TQA) involves retrieving relevant tables from a large corpus to answer natural language queries.<n>textbfCRAFT$ is a cascaded retrieval approach that first uses a sparse retrieval model to filter a subset of candidate tables.<n>textbfCRAFT$ achieves better retrieval performance than state-of-the-art (SOTA) sparse, dense, and hybrid retrievers.
arXiv Detail & Related papers (2025-05-21T00:09:34Z)
LIST: Learning to Index Spatio-Textual Data for Embedding based Spatial Keyword Queries [53.843367588870585]
List K-kNN spatial keyword queries (TkQs) return a list of objects based on a ranking function that considers both spatial and textual relevance. There are two key challenges in building an effective and efficient index, i.e., the absence of high-quality labels and the unbalanced results. We develop a novel pseudolabel generation technique to address the two challenges.
arXiv Detail & Related papers (2024-03-12T05:32:33Z)
FeatAug: Automatic Feature Augmentation From One-to-Many Relationship Tables [4.058220332950672]
Feature augmentation from one-to-many relationship tables is a critical but challenging problem in ML model development. We propose FEATAUG, a new feature augmentation framework that automatically extracts predicate-aware queries from one-to-many relationship tables. Our experiments on four real-world datasets demonstrate that FeatAug extracts more effective features compared to Featuretools.
arXiv Detail & Related papers (2024-03-11T01:44:14Z)
Relational Deep Learning: Graph Representation Learning on Relational Databases [69.7008152388055]
We introduce an end-to-end representation approach to learn on data laid out across multiple tables. Message Passing Graph Neural Networks can then automatically learn across the graph to extract representations that leverage all data input.
arXiv Detail & Related papers (2023-12-07T18:51:41Z)
Improving Content Retrievability in Search with Controllable Query Generation [5.450798147045502]
Machine-learned search engines have a high retrievability bias, where the majority of the queries return the same entities. We propose CtrlQGen, a method that generates queries for a chosen underlying intent-narrow or broad. Our results on datasets from the domains of music, podcasts, and books reveal that we can significantly decrease the retrievability bias of a dense retrieval model.
arXiv Detail & Related papers (2023-03-21T07:46:57Z)
UniKGQA: Unified Retrieval and Reasoning for Solving Multi-hop Question Answering Over Knowledge Graph [89.98762327725112]
Multi-hop Question Answering over Knowledge Graph(KGQA) aims to find the answer entities that are multiple hops away from the topic entities mentioned in a natural language question. We propose UniKGQA, a novel approach for multi-hop KGQA task, by unifying retrieval and reasoning in both model architecture and parameter learning.
arXiv Detail & Related papers (2022-12-02T04:08:09Z)
Graph Enhanced BERT for Query Understanding [55.90334539898102]
query understanding plays a key role in exploring users' search intents and facilitating users to locate their most desired information. In recent years, pre-trained language models (PLMs) have advanced various natural language processing tasks. We propose a novel graph-enhanced pre-training framework, GE-BERT, which can leverage both query content and the query graph.
arXiv Detail & Related papers (2022-04-03T16:50:30Z)
A Graph Representation of Semi-structured Data for Web Question Answering [96.46484690047491]
We propose a novel graph representation of Web tables and lists based on a systematic categorization of the components in semi-structured data as well as their relations. Our method improves F1 score by 3.90 points over the state-of-the-art baselines.
arXiv Detail & Related papers (2020-10-14T04:01:54Z)
Query Understanding via Intent Description Generation [75.64800976586771]
We propose a novel Query-to-Intent-Description (Q2ID) task for query understanding. Unlike existing ranking tasks which leverage the query and its description to compute the relevance of documents, Q2ID is a reverse task which aims to generate a natural language intent description. We demonstrate the effectiveness of our model by comparing with several state-of-the-art generation models on the Q2ID task.
arXiv Detail & Related papers (2020-08-25T08:56:40Z)
Deep Search Query Intent Understanding [17.79430887321982]
This paper aims to provide a comprehensive learning framework for modeling query intent under different stages of a search. We focus on the design for 1) predicting users' intents as they type in queries on-the-fly in typeahead search using character-level models; and 2) accurate word-level intent prediction models for complete queries.
arXiv Detail & Related papers (2020-08-15T18:19:56Z)
Efficient Neural Query Auto Completion [17.58784759652327]
Three major challenges are observed for a query auto completion system. Traditional QAC systems rely on handcrafted features such as the query candidate frequency in search logs. We propose an efficient neural QAC system with effective context modeling to overcome these challenges.
arXiv Detail & Related papers (2020-08-06T21:28:36Z)
Open Domain Question Answering Using Web Tables [8.25461115955717]
We develop an open-domain QA approach using web tables that works for both factoid and non-factoid queries. Our solution is used in production in a major commercial web search engine and serves direct answers for tens of millions of real user queries per month.
arXiv Detail & Related papers (2020-01-10T01:25:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.