Applying Transfer Learning for Improving Domain-Specific Search
Experience Using Query to Question Similarity
- URL: http://arxiv.org/abs/2101.02351v1
- Date: Thu, 7 Jan 2021 03:27:32 GMT
- Title: Applying Transfer Learning for Improving Domain-Specific Search
Experience Using Query to Question Similarity
- Authors: Ankush Chopra, Shruti Agrawal and Sohom Ghosh
- Abstract summary: We discuss a framework for calculating similarities between a given input query and a set of predefined questions to retrieve the question which matches to it the most.
We have used it for the financial domain, but the framework is generalized for any domain-specific search engine and can be used in other domains as well.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Search is one of the most common platforms used to seek information. However,
users mostly get overloaded with results whenever they use such a platform to
resolve their queries. Nowadays, direct answers to queries are being provided
as a part of the search experience. The question-answer (QA) retrieval process
plays a significant role in enriching the search experience. Most off-the-shelf
Semantic Textual Similarity models work fine for well-formed search queries,
but their performances degrade when applied to a domain-specific setting having
incomplete or grammatically ill-formed search queries in prevalence. In this
paper, we discuss a framework for calculating similarities between a given
input query and a set of predefined questions to retrieve the question which
matches to it the most. We have used it for the financial domain, but the
framework is generalized for any domain-specific search engine and can be used
in other domains as well. We use Siamese network [6] over Long Short-Term
Memory (LSTM) [3] models to train a classifier which generates unnormalized and
normalized similarity scores for a given pair of questions. Moreover, for each
of these question pairs, we calculate three other similarity scores: cosine
similarity between their average word2vec embeddings [15], cosine similarity
between their sentence embeddings [7] generated using RoBERTa [17] and their
customized fuzzy-match score. Finally, we develop a metaclassifier using
Support Vector Machines [19] for combining these five scores to detect if a
given pair of questions is similar. We benchmark our model's performance
against existing State Of The Art (SOTA) models on Quora Question Pairs (QQP)
dataset as well as a dataset specific to the financial domain.
Related papers
- Robust Knowledge Extraction from Large Language Models using Social
Choice Theory [18.634845632109496]
Large-language models (LLMs) can support a wide range of applications like conversational agents, creative writing or general query answering.
They are ill-suited for query answering in high-stake domains like medicine because they are typically not robust.
We propose using ranking queries repeatedly and to aggregate the queries using methods from social choice theory.
arXiv Detail & Related papers (2023-12-22T17:57:29Z) - Semantic Equivalence of e-Commerce Queries [6.232692545488813]
This paper introduces a framework to recognize and leverage query equivalence to enhance searcher and business outcomes.
The proposed approach addresses three key problems: mapping queries to vector representations of search intent, identifying nearest neighbor queries expressing equivalent or similar intent, and optimizing for user or business objectives.
arXiv Detail & Related papers (2023-08-07T18:40:13Z) - QUADRo: Dataset and Models for QUestion-Answer Database Retrieval [97.84448420852854]
Given a database (DB) of question/answer (q/a) pairs, it is possible to answer a target question by scanning the DB for similar questions.
We build a large scale DB of 6.3M q/a pairs, using public questions, and design a new system based on neural IR and a q/a pair reranker.
We show that our DB-based approach is competitive with Web-based methods, i.e., a QA system built on top the BING search engine.
arXiv Detail & Related papers (2023-03-30T00:42:07Z) - Automated Query Generation for Evidence Collection from Web Search
Engines [2.642698101441705]
It is widely accepted that so-called facts can be checked by searching for information on the Internet.
This process requires a fact-checker to formulate a search query based on the fact and to present it to a search engine.
We ask the question as to whether it is possible to automate the first step, that of query generation.
arXiv Detail & Related papers (2023-03-15T14:32:00Z) - UniKGQA: Unified Retrieval and Reasoning for Solving Multi-hop Question
Answering Over Knowledge Graph [89.98762327725112]
Multi-hop Question Answering over Knowledge Graph(KGQA) aims to find the answer entities that are multiple hops away from the topic entities mentioned in a natural language question.
We propose UniKGQA, a novel approach for multi-hop KGQA task, by unifying retrieval and reasoning in both model architecture and parameter learning.
arXiv Detail & Related papers (2022-12-02T04:08:09Z) - Session-Aware Query Auto-completion using Extreme Multi-label Ranking [61.753713147852125]
We take the novel approach of modeling session-aware query auto-completion as an e Multi-Xtreme Ranking (XMR) problem.
We adapt a popular XMR algorithm for this purpose by proposing several modifications to the key steps in the algorithm.
Our approach meets the stringent latency requirements for auto-complete systems while leveraging session information in making suggestions.
arXiv Detail & Related papers (2020-12-09T17:56:22Z) - Effective FAQ Retrieval and Question Matching With Unsupervised
Knowledge Injection [10.82418428209551]
We propose a contextual language model for retrieving appropriate answers to frequently asked questions.
We also explore to capitalize on domain-specific topically-relevant relations between words in an unsupervised manner.
We evaluate variants of our approach on a publicly-available Chinese FAQ dataset, and further apply and contextualize it to a large-scale question-matching task.
arXiv Detail & Related papers (2020-10-27T05:03:34Z) - Open Question Answering over Tables and Text [55.8412170633547]
In open question answering (QA), the answer to a question is produced by retrieving and then analyzing documents that might contain answers to the question.
Most open QA systems have considered only retrieving information from unstructured text.
We present a new large-scale dataset Open Table-and-Text Question Answering (OTT-QA) to evaluate performance on this task.
arXiv Detail & Related papers (2020-10-20T16:48:14Z) - Match$^2$: A Matching over Matching Model for Similar Question
Identification [74.7142127303489]
Community Question Answering (CQA) has become a primary means for people to acquire knowledge, where people are free to ask questions or submit answers.
Similar question identification becomes a core task in CQA which aims to find a similar question from the archived repository whenever a new question is asked.
It has long been a challenge to properly measure the similarity between two questions due to the inherent variation of natural language, i.e., there could be different ways to ask a same question or different questions sharing similar expressions.
Traditional methods typically take a one-side usage, which leverages the answer as some expanded representation of the
arXiv Detail & Related papers (2020-06-21T05:59:34Z) - Robust Question Answering Through Sub-part Alignment [53.94003466761305]
We model question answering as an alignment problem.
We train our model on SQuAD v1.1 and test it on several adversarial and out-of-domain datasets.
arXiv Detail & Related papers (2020-04-30T09:10:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.