Method of the coherence evaluation of Ukrainian text
- URL: http://arxiv.org/abs/2011.00310v1
- Date: Sat, 31 Oct 2020 16:48:55 GMT
- Title: Method of the coherence evaluation of Ukrainian text
- Authors: S. D. Pogorilyy and A. A. Kramov
- Abstract summary: Methods for text coherence measurements for Ukrainian language are analyzed.
Training and examination procedures are made on the corpus of Ukrainian texts.
Test procedure is implemented by performing of two typical tasks for the text coherence assessment.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Due to the growing role of the SEO technologies, it is necessary to perform
an automated analysis of the article's quality. Such approach helps both to
return the most intelligible pages for the user's query and to raise the web
sites positions to the top of query results. An automated assessment of a
coherence is a part of the complex analysis of the text. In this article, main
methods for text coherence measurements for Ukrainian language are analyzed.
Expediency of using the semantic similarity graph method in comparison with
other methods are explained. It is suggested the improvement of that method by
the pre-training of the neural network for vector representations of sentences.
Experimental examination of the original method and its modifications is made.
Training and examination procedures are made on the corpus of Ukrainian texts,
which were previously retrieved from abstracts and full texts of Ukrainian
scientific articles. The testing procedure is implemented by performing of two
typical tasks for the text coherence assessment: document discrimination task
and insertion task. Accordingly to the analysis it is defined the most
effective combination of method's modification and its parameter for the
measurement of the text coherence.
Related papers
- Efficiently Leveraging Linguistic Priors for Scene Text Spotting [63.22351047545888]
This paper proposes a method that leverages linguistic knowledge from a large text corpus to replace the traditional one-hot encoding used in auto-regressive scene text spotting and recognition models.
We generate text distributions that align well with scene text datasets, removing the need for in-domain fine-tuning.
Experimental results show that our method not only improves recognition accuracy but also enables more accurate localization of words.
arXiv Detail & Related papers (2024-02-27T01:57:09Z) - Language Model Decoding as Direct Metrics Optimization [87.68281625776282]
Current decoding methods struggle to generate texts that align with human texts across different aspects.
In this work, we frame decoding from a language model as an optimization problem with the goal of strictly matching the expected performance with human texts.
We prove that this induced distribution is guaranteed to improve the perplexity on human texts, which suggests a better approximation to the underlying distribution of human texts.
arXiv Detail & Related papers (2023-10-02T09:35:27Z) - An Adversarial Multi-Task Learning Method for Chinese Text Correction
with Semantic Detection [0.0]
adversarial multi-task learning method is proposed to enhance the modeling and detection ability of character polysemy in Chinese sentence context.
Monte Carlo tree search strategy and a policy network are introduced to accomplish the efficient Chinese text correction task with semantic detection.
arXiv Detail & Related papers (2023-06-28T15:46:00Z) - TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision [61.186488081379]
We propose TextFormer, a query-based end-to-end text spotter with Transformer architecture.
TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multi-task modeling.
It allows for mutual training and optimization of classification, segmentation, and recognition branches, resulting in deeper feature sharing.
arXiv Detail & Related papers (2023-06-06T03:37:41Z) - Uzbek text summarization based on TF-IDF [0.0]
This article presents an experiment on summarization task for Uzbek language.
The methodology was based on text abstracting based on TF-IDF algorithm.
We summarize the given text by applying the n-gram method to important parts of the whole text.
arXiv Detail & Related papers (2023-03-01T12:39:46Z) - Textual Entailment Recognition with Semantic Features from Empirical
Text Representation [60.31047947815282]
A text entails a hypothesis if and only if the true value of the hypothesis follows the text.
In this paper, we propose a novel approach to identifying the textual entailment relationship between text and hypothesis.
We employ an element-wise Manhattan distance vector-based feature that can identify the semantic entailment relationship between the text-hypothesis pair.
arXiv Detail & Related papers (2022-10-18T10:03:51Z) - Text Revision by On-the-Fly Representation Optimization [76.11035270753757]
Current state-of-the-art methods formulate these tasks as sequence-to-sequence learning problems.
We present an iterative in-place editing approach for text revision, which requires no parallel data.
It achieves competitive and even better performance than state-of-the-art supervised methods on text simplification.
arXiv Detail & Related papers (2022-04-15T07:38:08Z) - TFW2V: An Enhanced Document Similarity Method for the Morphologically
Rich Finnish Language [0.5801044612920816]
This study focuses on experimenting with some of the current approaches to Finnish, which is a morphologically rich language.
We propose a simple method, TFW2V, which shows high efficiency in handling both long text documents and limited amounts of data.
arXiv Detail & Related papers (2021-12-23T12:27:45Z) - Method of noun phrase detection in Ukrainian texts [0.0]
The investigation of the search for noun phrases within Ukrainian texts are still at an early stage.
The complex method of noun phrases detection in Ukrainian texts utilizing Universal Dependencies means and named-entity recognition model has been suggested.
arXiv Detail & Related papers (2020-10-22T09:20:24Z) - Matching Text with Deep Mutual Information Estimation [0.0]
We present a neural approach for general-purpose text matching with deep mutual information estimation incorporated.
Our approach, Text matching with Deep Info Max (TIM), is integrated with a procedure of unsupervised learning of representations.
We evaluate our text matching approach on several tasks including natural language inference, paraphrase identification, and answer selection.
arXiv Detail & Related papers (2020-03-09T15:25:37Z) - Text Perceptron: Towards End-to-End Arbitrary-Shaped Text Spotting [49.768327669098674]
We propose an end-to-end trainable text spotting approach named Text Perceptron.
It first employs an efficient segmentation-based text detector that learns the latent text reading order and boundary information.
Then a novel Shape Transform Module (abbr. STM) is designed to transform the detected feature regions into regular morphologies.
arXiv Detail & Related papers (2020-02-17T08:07:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.