Match-Ignition: Plugging PageRank into Transformer for Long-form Text
Matching
- URL: http://arxiv.org/abs/2101.06423v1
- Date: Sat, 16 Jan 2021 10:34:03 GMT
- Title: Match-Ignition: Plugging PageRank into Transformer for Long-form Text
Matching
- Authors: Liang Pang, Yanyan Lan, Xueqi Cheng
- Abstract summary: We propose a novel hierarchical noise filtering model, namely Match-Ignition, to tackle the effectiveness and efficiency problem.
The basic idea is to plug the well-known PageRank algorithm into the Transformer, to identify and filter both sentence and word level noisy information.
Noisy sentences are usually easy to detect because the sentence is the basic unit of a long-form text, so we directly use PageRank to filter such information.
- Score: 66.71886789848472
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Semantic text matching models have been widely used in community question
answering, information retrieval, and dialogue. However, these models cannot
well address the long-form text matching problem. That is because there are
usually many noises in the setting of long-form text matching, and it is
difficult for existing semantic text matching to capture the key matching
signals from this noisy information. Besides, these models are computationally
expensive because they simply use all textual data indiscriminately in the
matching process. To tackle the effectiveness and efficiency problem, we
propose a novel hierarchical noise filtering model in this paper, namely
Match-Ignition. The basic idea is to plug the well-known PageRank algorithm
into the Transformer, to identify and filter both sentence and word level noisy
information in the matching process. Noisy sentences are usually easy to detect
because the sentence is the basic unit of a long-form text, so we directly use
PageRank to filter such information, based on a sentence similarity graph.
While words need to rely on their contexts to express concrete meanings, so we
propose to jointly learn the filtering process and the matching process, to
reflect the contextual dependencies between words. Specifically, a word graph
is first built based on the attention scores in each self-attention block of
Transformer, and keywords are then selected by applying PageRank on this graph.
In this way, noisy words will be filtered out layer by layer in the matching
process. Experimental results show that Match-Ignition outperforms both
traditional text matching models for short text and recent long-form text
matching models. We also conduct detailed analysis to show that Match-Ignition
can efficiently capture important sentences or words, which are helpful for
long-form text matching.
Related papers
- CAST: Corpus-Aware Self-similarity Enhanced Topic modelling [16.562349140796115]
We introduce CAST: Corpus-Aware Self-similarity Enhanced Topic modelling, a novel topic modelling method.
We find self-similarity to be an effective metric to prevent functional words from acting as candidate topic words.
Our approach significantly enhances the coherence and diversity of generated topics, as well as the topic model's ability to handle noisy data.
arXiv Detail & Related papers (2024-10-19T15:27:11Z) - Description-Based Text Similarity [59.552704474862004]
We identify the need to search for texts based on abstract descriptions of their content.
We propose an alternative model that significantly improves when used in standard nearest neighbor search.
arXiv Detail & Related papers (2023-05-21T17:14:31Z) - Graph-based Semantical Extractive Text Analysis [0.0]
In this work, we improve the results of the TextRank algorithm by incorporating the semantic similarity between parts of the text.
Aside from keyword extraction and text summarization, we develop a topic clustering algorithm based on our framework.
arXiv Detail & Related papers (2022-12-19T18:30:26Z) - Divide and Conquer: Text Semantic Matching with Disentangled Keywords
and Intents [19.035917264711664]
We propose a training strategy for text semantic matching by disentangling keywords from intents.
Our approach can be easily combined with pre-trained language models (PLM) without influencing their inference efficiency.
arXiv Detail & Related papers (2022-03-06T07:48:24Z) - Unsupervised Matching of Data and Text [6.2520079463149205]
We introduce a framework that supports matching textual content and structured data in an unsupervised setting.
Our method builds a fine-grained graph over the content of the corpora and derives word embeddings to represent the objects to match in a low dimensional space.
Experiments on real use cases and public datasets show that our framework produces embeddings that outperform word embeddings and fine-tuned language models.
arXiv Detail & Related papers (2021-12-16T10:40:48Z) - Narrative Incoherence Detection [76.43894977558811]
We propose the task of narrative incoherence detection as a new arena for inter-sentential semantic understanding.
Given a multi-sentence narrative, decide whether there exist any semantic discrepancies in the narrative flow.
arXiv Detail & Related papers (2020-12-21T07:18:08Z) - Learning to Match Jobs with Resumes from Sparse Interaction Data using
Multi-View Co-Teaching Network [83.64416937454801]
Job-resume interaction data is sparse and noisy, which affects the performance of job-resume match algorithms.
We propose a novel multi-view co-teaching network from sparse interaction data for job-resume matching.
Our model is able to outperform state-of-the-art methods for job-resume matching.
arXiv Detail & Related papers (2020-09-25T03:09:54Z) - Sequential Sentence Matching Network for Multi-turn Response Selection
in Retrieval-based Chatbots [45.920841134523286]
We propose a matching network, called sequential sentence matching network (S2M), to use the sentence-level semantic information to address the problem.
Firstly, we find that by using the sentence-level semantic information, the network successfully addresses the problem and gets a significant improvement on matching, resulting in a state-of-the-art performance.
arXiv Detail & Related papers (2020-05-16T09:47:19Z) - Extractive Summarization as Text Matching [123.09816729675838]
This paper creates a paradigm shift with regard to the way we build neural extractive summarization systems.
We formulate the extractive summarization task as a semantic text matching problem.
We have driven the state-of-the-art extractive result on CNN/DailyMail to a new level (44.41 in ROUGE-1)
arXiv Detail & Related papers (2020-04-19T08:27:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.