Related papers: Research on Annotation Rules and Recognition Algorithm Based on Phrase Window

Research on Annotation Rules and Recognition Algorithm Based on Phrase Window

URL: http://arxiv.org/abs/2007.03140v1
Date: Tue, 7 Jul 2020 00:19:47 GMT
Title: Research on Annotation Rules and Recognition Algorithm Based on Phrase Window
Authors: Guang Liu, Gang Tu, Zheng Li, Yi-Jian Liu
Abstract summary: We propose labeling rules based on phrase windows, and designed corresponding phrase recognition algorithms. The labeling rule uses phrases as the minimum unit, di-vides sentences into 7 types of nestable phrase types, and marks the grammatical dependencies between phrases. The corresponding algorithm, drawing on the idea of identifying the target area in the image field, can find the start and end positions of various phrases in the sentence.
Score: 4.334276223622026
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: At present, most Natural Language Processing technology is based on the results of Word Segmentation for Dependency Parsing, which mainly uses an end-to-end method based on supervised learning. There are two main problems with this method: firstly, the la-beling rules are complex and the data is too difficult to label, the workload of which is large; secondly, the algorithm cannot recognize the multi-granularity and diversity of language components. In order to solve these two problems, we propose labeling rules based on phrase windows, and designed corresponding phrase recognition algorithms. The labeling rule uses phrases as the minimum unit, di-vides sentences into 7 types of nestable phrase types, and marks the grammatical dependencies between phrases. The corresponding algorithm, drawing on the idea of identifying the target area in the image field, can find the start and end positions of various phrases in the sentence, and realize the synchronous recognition of nested phrases and grammatical dependencies. The results of the experiment shows that the labeling rule is convenient and easy to use, and there is no ambiguity; the algorithm is more grammatically multi-granular and diverse than the end-to-end algorithm. Experiments on the CPWD dataset improve the accuracy of the end-to-end method by about 1 point. The corresponding method was applied to the CCL2018 competition, and the first place in the Chinese Metaphor Sentiment Analysis Task.

Related papers

Context Biasing for Pronunciations-Orthography Mismatch in Automatic Speech Recognition [56.972851337263755]
We propose a method which allows corrections of substitution errors to improve the recognition accuracy of challenging words.<n>We show that with this method we get a relative improvement in biased word error rate of up to 11%, while maintaining a competitive overall word error rate.
arXiv Detail & Related papers (2025-06-23T14:42:03Z)
RankCSE: Unsupervised Sentence Representations Learning via Learning to Rank [54.854714257687334]
We propose a novel approach, RankCSE, for unsupervised sentence representation learning. It incorporates ranking consistency and ranking distillation with contrastive learning into a unified framework. An extensive set of experiments are conducted on both semantic textual similarity (STS) and transfer (TR) tasks.
arXiv Detail & Related papers (2023-05-26T08:27:07Z)
Towards Unsupervised Recognition of Token-level Semantic Differences in Related Documents [61.63208012250885]
We formulate recognizing semantic differences as a token-level regression task. We study three unsupervised approaches that rely on a masked language model. Our results show that an approach based on word alignment and sentence-level contrastive learning has a robust correlation to gold labels.
arXiv Detail & Related papers (2023-05-22T17:58:04Z)
Multi-Instance Partial-Label Learning: Towards Exploiting Dual Inexact Supervision [53.530957567507365]
In some real-world tasks, each training sample is associated with a candidate label set that contains one ground-truth label and some false positive labels. In this paper, we formalize such problems as multi-instance partial-label learning (MIPL) Existing multi-instance learning algorithms and partial-label learning algorithms are suboptimal for solving MIPL problems.
arXiv Detail & Related papers (2022-12-18T03:28:51Z)
DEIM: An effective deep encoding and interaction model for sentence matching [0.0]
We propose a sentence matching method based on deep encoding and interaction to extract deep semantic information. In the encoder layer,we refer to the information of another sentence in the process of encoding a single sentence, and later use a algorithm to fuse the information. In the interaction layer, we use a bidirectional attention mechanism and a self-attention mechanism to obtain deep semantic information.
arXiv Detail & Related papers (2022-03-20T07:59:42Z)
Automatic Vocabulary and Graph Verification for Accurate Loop Closure Detection [21.862978912891677]
Bag-of-Words (BoW) builds a visual vocabulary to associate features and then detect loops. We propose a natural convergence criterion based on the comparison between the radii of nodes and the drifts of feature descriptors. We present a novel topological graph verification method for validating candidate loops.
arXiv Detail & Related papers (2021-07-30T13:19:33Z)
Extractive approach for text summarisation using graphs [0.0]
Our paper explores different graph-related algorithms that can be used in solving the text summarization problem using an extractive approach. We consider two metrics: sentence overlap and edit distance for measuring sentence similarity.
arXiv Detail & Related papers (2021-06-21T10:03:34Z)
Match-Ignition: Plugging PageRank into Transformer for Long-form Text Matching [66.71886789848472]
We propose a novel hierarchical noise filtering model, namely Match-Ignition, to tackle the effectiveness and efficiency problem. The basic idea is to plug the well-known PageRank algorithm into the Transformer, to identify and filter both sentence and word level noisy information. Noisy sentences are usually easy to detect because the sentence is the basic unit of a long-form text, so we directly use PageRank to filter such information.
arXiv Detail & Related papers (2021-01-16T10:34:03Z)
A Novel Word Sense Disambiguation Approach Using WordNet Knowledge Graph [0.0]
This paper presents a knowledge-based word sense disambiguation algorithm, namely Sequential Contextual Similarity Matrix multiplication (SCSMM) The SCSMM algorithm combines semantic similarity, knowledge, and document context to respectively exploit the merits of local context. The proposed algorithm outperformed all other algorithms when disambiguating nouns on the combined gold standard datasets.
arXiv Detail & Related papers (2021-01-08T06:47:32Z)
Accelerating Text Mining Using Domain-Specific Stop Word Lists [57.76576681191192]
We present a novel approach for the automatic extraction of domain-specific words called the hyperplane-based approach. The hyperplane-based approach can significantly reduce text dimensionality by eliminating irrelevant features. Results indicate that the hyperplane-based approach can reduce the dimensionality of the corpus by 90% and outperforms mutual information.
arXiv Detail & Related papers (2020-11-18T17:42:32Z)
Research on multi-dimensional end-to-end phrase recognition algorithm based on background knowledge [4.020059842004492]
The experiment on CPWD dataset, by introducing background knowledge, the new algorithm improves the accuracy of the end-to-end method by more than one point. The corresponding method was applied to the CCL 2018 competition and won the first place in the task of Chinese humor type recognition.
arXiv Detail & Related papers (2020-07-08T02:30:00Z)
TextScanner: Reading Characters in Order for Robust Scene Text Recognition [60.04267660533966]
TextScanner is an alternative approach for scene text recognition. It generates pixel-wise, multi-channel segmentation maps for character class, position and order. It also adopts RNN for context modeling and performs paralleled prediction for character position and class.
arXiv Detail & Related papers (2019-12-28T07:52:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.