Keyword-Attentive Deep Semantic Matching
- URL: http://arxiv.org/abs/2003.11516v1
- Date: Wed, 11 Mar 2020 10:18:32 GMT
- Title: Keyword-Attentive Deep Semantic Matching
- Authors: Changyu Miao, Zhen Cao and Yik-Cheung Tam
- Abstract summary: We propose a keyword-attentive approach to improve deep semantic matching.
We first leverage domain tags from a large corpus to generate a domain-enhanced keyword dictionary.
During model training, we propose a new negative sampling approach based on keyword coverage between the input pair.
- Score: 1.8416014644193064
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep Semantic Matching is a crucial component in various natural language
processing applications such as question and answering (QA), where an input
query is compared to each candidate question in a QA corpus in terms of
relevance. Measuring similarities between a query-question pair in an open
domain scenario can be challenging due to diverse word tokens in the
queryquestion pair. We propose a keyword-attentive approach to improve deep
semantic matching. We first leverage domain tags from a large corpus to
generate a domain-enhanced keyword dictionary. Built upon BERT, we stack a
keyword-attentive transformer layer to highlight the importance of keywords in
the query-question pair. During model training, we propose a new negative
sampling approach based on keyword coverage between the input pair. We evaluate
our approach on a Chinese QA corpus using various metrics, including precision
of retrieval candidates and accuracy of semantic matching. Experiments show
that our approach outperforms existing strong baselines. Our approach is
general and can be applied to other text matching tasks with little adaptation.
Related papers
- A General and Flexible Multi-concept Parsing Framework for Multilingual Semantic Matching [60.51839859852572]
We propose to resolve the text into multi concepts for multilingual semantic matching to liberate the model from the reliance on NER models.
We conduct comprehensive experiments on English datasets QQP and MRPC, and Chinese dataset Medical-SM.
arXiv Detail & Related papers (2024-03-05T13:55:16Z) - Dense X Retrieval: What Retrieval Granularity Should We Use? [56.90827473115201]
Often-overlooked design choice is the retrieval unit in which the corpus is indexed, e.g. document, passage, or sentence.
We introduce a novel retrieval unit, proposition, for dense retrieval.
Experiments reveal that indexing a corpus by fine-grained units such as propositions significantly outperforms passage-level units in retrieval tasks.
arXiv Detail & Related papers (2023-12-11T18:57:35Z) - Enhancing BERT-Based Visual Question Answering through Keyword-Driven
Sentence Selection [8.586466827855016]
Document-based Visual Question Answering competition addresses the automatic detection of parent-child relationships in documents.
This paper describes the PoliTo's approach to addressing this task, in particular, our best solution explores a text-only approach.
Thanks to the effectiveness of this approach, we are able to achieve high performance compared to baselines.
arXiv Detail & Related papers (2023-10-13T22:43:55Z) - Open-vocabulary Keyword-spotting with Adaptive Instance Normalization [18.250276540068047]
We propose AdaKWS, a novel method for keyword spotting in which a text encoder is trained to output keyword-conditioned normalization parameters.
We show significant improvements over recent keyword spotting and ASR baselines.
arXiv Detail & Related papers (2023-09-13T13:49:42Z) - Towards Unsupervised Recognition of Token-level Semantic Differences in
Related Documents [61.63208012250885]
We formulate recognizing semantic differences as a token-level regression task.
We study three unsupervised approaches that rely on a masked language model.
Our results show that an approach based on word alignment and sentence-level contrastive learning has a robust correlation to gold labels.
arXiv Detail & Related papers (2023-05-22T17:58:04Z) - Semantic Parsing for Conversational Question Answering over Knowledge
Graphs [63.939700311269156]
We develop a dataset where user questions are annotated with Sparql parses and system answers correspond to execution results thereof.
We present two different semantic parsing approaches and highlight the challenges of the task.
Our dataset and models are released at https://github.com/Edinburgh/SPICE.
arXiv Detail & Related papers (2023-01-28T14:45:11Z) - Divide and Conquer: Text Semantic Matching with Disentangled Keywords
and Intents [19.035917264711664]
We propose a training strategy for text semantic matching by disentangling keywords from intents.
Our approach can be easily combined with pre-trained language models (PLM) without influencing their inference efficiency.
arXiv Detail & Related papers (2022-03-06T07:48:24Z) - More Than Words: Collocation Tokenization for Latent Dirichlet
Allocation Models [71.42030830910227]
We propose a new metric for measuring the clustering quality in settings where the models differ.
We show that topics trained with merged tokens result in topic keys that are clearer, more coherent, and more effective at distinguishing topics than those unmerged models.
arXiv Detail & Related papers (2021-08-24T14:08:19Z) - Quotient Space-Based Keyword Retrieval in Sponsored Search [7.639289301435027]
Synonymous keyword retrieval has become an important problem for sponsored search.
We propose a novel quotient space-based retrieval framework to address this problem.
This method has been successfully implemented in Baidu's online sponsored search system.
arXiv Detail & Related papers (2021-05-26T07:27:54Z) - MASKER: Masked Keyword Regularization for Reliable Text Classification [73.90326322794803]
We propose a fine-tuning method, coined masked keyword regularization (MASKER), that facilitates context-based prediction.
MASKER regularizes the model to reconstruct the keywords from the rest of the words and make low-confidence predictions without enough context.
We demonstrate that MASKER improves OOD detection and cross-domain generalization without degrading classification accuracy.
arXiv Detail & Related papers (2020-12-17T04:54:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.