TableQnA: Answering List Intent Queries With Web Tables
- URL: http://arxiv.org/abs/2001.04828v1
- Date: Fri, 10 Jan 2020 01:43:54 GMT
- Title: TableQnA: Answering List Intent Queries With Web Tables
- Authors: Kaushik Chakrabarti, Zhimin Chen, Siamak Shakeri, Guihong Cao, Surajit
Chaudhuri
- Abstract summary: We focus on answering two classes of queries with HTML tables: those seeking lists of entities and those seeking superlative entities.
Existing approaches train machine learning models to select the answer from the candidates.
We develop novel features to compute structure-aware match and train a machine learning model.
- Score: 12.941073798838167
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The web contains a vast corpus of HTML tables. They can be used to provide
direct answers to many web queries. We focus on answering two classes of
queries with those tables: those seeking lists of entities (e.g., `cities in
california') and those seeking superlative entities (e.g., `largest city in
california'). The main challenge is to achieve high precision with significant
coverage. Existing approaches train machine learning models to select the
answer from the candidates; they rely on textual match features between the
query and the content of the table along with features capturing table
quality/importance. These features alone are inadequate for achieving the above
goals. Our main insight is that we can improve precision by (i) first
extracting intent (structured information) from the query for the above query
classes and (ii) then performing structure-aware matching (instead of just
textual matching) between the extracted intent and the candidates to select the
answer. We model (i) as a sequence tagging task. We leverage state-of-the-art
deep neural network models with word embeddings. The model requires large scale
training data which is expensive to obtain via manual labeling; we therefore
develop a novel method to automatically generate the training data. For (ii),
we develop novel features to compute structure-aware match and train a machine
learning model. Our experiments on real-life web search queries show that (i)
our intent extractor for list and superlative intent queries has significantly
higher precision and coverage compared with baseline approaches and (ii) our
table answer selector significantly outperforms the state-of-the-art baseline
approach. This technology has been used in production by Microsoft's Bing
search engine since 2016.
Related papers
- Less is More: Making Smaller Language Models Competent Subgraph Retrievers for Multi-hop KGQA [51.3033125256716]
We model the subgraph retrieval task as a conditional generation task handled by small language models.
Our base generative subgraph retrieval model, consisting of only 220M parameters, competitive retrieval performance compared to state-of-the-art models.
Our largest 3B model, when plugged with an LLM reader, sets new SOTA end-to-end performance on both the WebQSP and CWQ benchmarks.
arXiv Detail & Related papers (2024-10-08T15:22:36Z) - FeatAug: Automatic Feature Augmentation From One-to-Many Relationship Tables [4.058220332950672]
Feature augmentation from one-to-many relationship tables is a critical but challenging problem in ML model development.
We propose FEATAUG, a new feature augmentation framework that automatically extracts predicate-aware queries from one-to-many relationship tables.
Our experiments on four real-world datasets demonstrate that FeatAug extracts more effective features compared to Featuretools.
arXiv Detail & Related papers (2024-03-11T01:44:14Z) - Relational Deep Learning: Graph Representation Learning on Relational
Databases [69.7008152388055]
We introduce an end-to-end representation approach to learn on data laid out across multiple tables.
Message Passing Graph Neural Networks can then automatically learn across the graph to extract representations that leverage all data input.
arXiv Detail & Related papers (2023-12-07T18:51:41Z) - Improving Content Retrievability in Search with Controllable Query
Generation [5.450798147045502]
Machine-learned search engines have a high retrievability bias, where the majority of the queries return the same entities.
We propose CtrlQGen, a method that generates queries for a chosen underlying intent-narrow or broad.
Our results on datasets from the domains of music, podcasts, and books reveal that we can significantly decrease the retrievability bias of a dense retrieval model.
arXiv Detail & Related papers (2023-03-21T07:46:57Z) - UniKGQA: Unified Retrieval and Reasoning for Solving Multi-hop Question
Answering Over Knowledge Graph [89.98762327725112]
Multi-hop Question Answering over Knowledge Graph(KGQA) aims to find the answer entities that are multiple hops away from the topic entities mentioned in a natural language question.
We propose UniKGQA, a novel approach for multi-hop KGQA task, by unifying retrieval and reasoning in both model architecture and parameter learning.
arXiv Detail & Related papers (2022-12-02T04:08:09Z) - Graph Enhanced BERT for Query Understanding [55.90334539898102]
query understanding plays a key role in exploring users' search intents and facilitating users to locate their most desired information.
In recent years, pre-trained language models (PLMs) have advanced various natural language processing tasks.
We propose a novel graph-enhanced pre-training framework, GE-BERT, which can leverage both query content and the query graph.
arXiv Detail & Related papers (2022-04-03T16:50:30Z) - A Graph Representation of Semi-structured Data for Web Question
Answering [96.46484690047491]
We propose a novel graph representation of Web tables and lists based on a systematic categorization of the components in semi-structured data as well as their relations.
Our method improves F1 score by 3.90 points over the state-of-the-art baselines.
arXiv Detail & Related papers (2020-10-14T04:01:54Z) - Query Understanding via Intent Description Generation [75.64800976586771]
We propose a novel Query-to-Intent-Description (Q2ID) task for query understanding.
Unlike existing ranking tasks which leverage the query and its description to compute the relevance of documents, Q2ID is a reverse task which aims to generate a natural language intent description.
We demonstrate the effectiveness of our model by comparing with several state-of-the-art generation models on the Q2ID task.
arXiv Detail & Related papers (2020-08-25T08:56:40Z) - Deep Search Query Intent Understanding [17.79430887321982]
This paper aims to provide a comprehensive learning framework for modeling query intent under different stages of a search.
We focus on the design for 1) predicting users' intents as they type in queries on-the-fly in typeahead search using character-level models; and 2) accurate word-level intent prediction models for complete queries.
arXiv Detail & Related papers (2020-08-15T18:19:56Z) - Efficient Neural Query Auto Completion [17.58784759652327]
Three major challenges are observed for a query auto completion system.
Traditional QAC systems rely on handcrafted features such as the query candidate frequency in search logs.
We propose an efficient neural QAC system with effective context modeling to overcome these challenges.
arXiv Detail & Related papers (2020-08-06T21:28:36Z) - Open Domain Question Answering Using Web Tables [8.25461115955717]
We develop an open-domain QA approach using web tables that works for both factoid and non-factoid queries.
Our solution is used in production in a major commercial web search engine and serves direct answers for tens of millions of real user queries per month.
arXiv Detail & Related papers (2020-01-10T01:25:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.