Research on Annotation Rules and Recognition Algorithm Based on Phrase
Window
- URL: http://arxiv.org/abs/2007.03140v1
- Date: Tue, 7 Jul 2020 00:19:47 GMT
- Title: Research on Annotation Rules and Recognition Algorithm Based on Phrase
Window
- Authors: Guang Liu, Gang Tu, Zheng Li, Yi-Jian Liu
- Abstract summary: We propose labeling rules based on phrase windows, and designed corresponding phrase recognition algorithms.
The labeling rule uses phrases as the minimum unit, di-vides sentences into 7 types of nestable phrase types, and marks the grammatical dependencies between phrases.
The corresponding algorithm, drawing on the idea of identifying the target area in the image field, can find the start and end positions of various phrases in the sentence.
- Score: 4.334276223622026
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: At present, most Natural Language Processing technology is based on the
results of Word Segmentation for Dependency Parsing, which mainly uses an
end-to-end method based on supervised learning. There are two main problems
with this method: firstly, the la-beling rules are complex and the data is too
difficult to label, the workload of which is large; secondly, the algorithm
cannot recognize the multi-granularity and diversity of language components. In
order to solve these two problems, we propose labeling rules based on phrase
windows, and designed corresponding phrase recognition algorithms. The labeling
rule uses phrases as the minimum unit, di-vides sentences into 7 types of
nestable phrase types, and marks the grammatical dependencies between phrases.
The corresponding algorithm, drawing on the idea of identifying the target area
in the image field, can find the start and end positions of various phrases in
the sentence, and realize the synchronous recognition of nested phrases and
grammatical dependencies. The results of the experiment shows that the labeling
rule is convenient and easy to use, and there is no ambiguity; the algorithm is
more grammatically multi-granular and diverse than the end-to-end algorithm.
Experiments on the CPWD dataset improve the accuracy of the end-to-end method
by about 1 point. The corresponding method was applied to the CCL2018
competition, and the first place in the Chinese Metaphor Sentiment Analysis
Task.
Related papers
- RankCSE: Unsupervised Sentence Representations Learning via Learning to
Rank [54.854714257687334]
We propose a novel approach, RankCSE, for unsupervised sentence representation learning.
It incorporates ranking consistency and ranking distillation with contrastive learning into a unified framework.
An extensive set of experiments are conducted on both semantic textual similarity (STS) and transfer (TR) tasks.
arXiv Detail & Related papers (2023-05-26T08:27:07Z) - Towards Unsupervised Recognition of Token-level Semantic Differences in
Related Documents [61.63208012250885]
We formulate recognizing semantic differences as a token-level regression task.
We study three unsupervised approaches that rely on a masked language model.
Our results show that an approach based on word alignment and sentence-level contrastive learning has a robust correlation to gold labels.
arXiv Detail & Related papers (2023-05-22T17:58:04Z) - Multi-Instance Partial-Label Learning: Towards Exploiting Dual Inexact
Supervision [53.530957567507365]
In some real-world tasks, each training sample is associated with a candidate label set that contains one ground-truth label and some false positive labels.
In this paper, we formalize such problems as multi-instance partial-label learning (MIPL)
Existing multi-instance learning algorithms and partial-label learning algorithms are suboptimal for solving MIPL problems.
arXiv Detail & Related papers (2022-12-18T03:28:51Z) - DEIM: An effective deep encoding and interaction model for sentence
matching [0.0]
We propose a sentence matching method based on deep encoding and interaction to extract deep semantic information.
In the encoder layer,we refer to the information of another sentence in the process of encoding a single sentence, and later use a algorithm to fuse the information.
In the interaction layer, we use a bidirectional attention mechanism and a self-attention mechanism to obtain deep semantic information.
arXiv Detail & Related papers (2022-03-20T07:59:42Z) - Automatic Vocabulary and Graph Verification for Accurate Loop Closure
Detection [21.862978912891677]
Bag-of-Words (BoW) builds a visual vocabulary to associate features and then detect loops.
We propose a natural convergence criterion based on the comparison between the radii of nodes and the drifts of feature descriptors.
We present a novel topological graph verification method for validating candidate loops.
arXiv Detail & Related papers (2021-07-30T13:19:33Z) - Extractive approach for text summarisation using graphs [0.0]
Our paper explores different graph-related algorithms that can be used in solving the text summarization problem using an extractive approach.
We consider two metrics: sentence overlap and edit distance for measuring sentence similarity.
arXiv Detail & Related papers (2021-06-21T10:03:34Z) - Match-Ignition: Plugging PageRank into Transformer for Long-form Text
Matching [66.71886789848472]
We propose a novel hierarchical noise filtering model, namely Match-Ignition, to tackle the effectiveness and efficiency problem.
The basic idea is to plug the well-known PageRank algorithm into the Transformer, to identify and filter both sentence and word level noisy information.
Noisy sentences are usually easy to detect because the sentence is the basic unit of a long-form text, so we directly use PageRank to filter such information.
arXiv Detail & Related papers (2021-01-16T10:34:03Z) - A Novel Word Sense Disambiguation Approach Using WordNet Knowledge Graph [0.0]
This paper presents a knowledge-based word sense disambiguation algorithm, namely Sequential Contextual Similarity Matrix multiplication (SCSMM)
The SCSMM algorithm combines semantic similarity, knowledge, and document context to respectively exploit the merits of local context.
The proposed algorithm outperformed all other algorithms when disambiguating nouns on the combined gold standard datasets.
arXiv Detail & Related papers (2021-01-08T06:47:32Z) - Accelerating Text Mining Using Domain-Specific Stop Word Lists [57.76576681191192]
We present a novel approach for the automatic extraction of domain-specific words called the hyperplane-based approach.
The hyperplane-based approach can significantly reduce text dimensionality by eliminating irrelevant features.
Results indicate that the hyperplane-based approach can reduce the dimensionality of the corpus by 90% and outperforms mutual information.
arXiv Detail & Related papers (2020-11-18T17:42:32Z) - Research on multi-dimensional end-to-end phrase recognition algorithm
based on background knowledge [4.020059842004492]
The experiment on CPWD dataset, by introducing background knowledge, the new algorithm improves the accuracy of the end-to-end method by more than one point.
The corresponding method was applied to the CCL 2018 competition and won the first place in the task of Chinese humor type recognition.
arXiv Detail & Related papers (2020-07-08T02:30:00Z) - TextScanner: Reading Characters in Order for Robust Scene Text
Recognition [60.04267660533966]
TextScanner is an alternative approach for scene text recognition.
It generates pixel-wise, multi-channel segmentation maps for character class, position and order.
It also adopts RNN for context modeling and performs paralleled prediction for character position and class.
arXiv Detail & Related papers (2019-12-28T07:52:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.