A Proposal of Automatic Error Correction in Text
- URL: http://arxiv.org/abs/2112.01846v1
- Date: Fri, 24 Sep 2021 17:17:56 GMT
- Title: A Proposal of Automatic Error Correction in Text
- Authors: Wulfrano A. Luna-Ram\'irez and Carlos R. Jaimez-Gonz\'alez
- Abstract summary: It is shown an application of automatic recognition and correction of ortographic errors in electronic texts.
The proposal is based in part of speech text categorization, word similarity, word diccionaries, statistical measures, morphologic analisys and n-grams based language model of Spanish.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The great amount of information that can be stored in electronic media is
growing up daily. Many of them is got mainly by typing, such as the huge of
information obtained from web 2.0 sites; or scaned and processing by an Optical
Character Recognition software, like the texts of libraries and goverment
offices. Both processes introduce error in texts, so it is difficult to use the
data for other purposes than just to read it, i.e. the processing of those
texts by other applications like e-learning, learning of languages, electronic
tutorials, data minning, information retrieval and even more specialized
systems such as tiflologic software, specifically blinded people-oriented
applications like automatic reading, where the text would be error free as
possible in order to make easier the text to speech task, and so on. In this
paper it is showed an application of automatic recognition and correction of
ortographic errors in electronic texts. This task is composed of three stages:
a) error detection; b) candidate corrections generation; and c) correction
-selection of the best candidate. The proposal is based in part of speech text
categorization, word similarity, word diccionaries, statistical measures,
morphologic analisys and n-grams based language model of Spanish.
Related papers
- EMTeC: A Corpus of Eye Movements on Machine-Generated Texts [2.17025619726098]
The Eye Movements on Machine-Generated Texts Corpus (EMTeC) is a naturalistic eye-movements-while-reading corpus of 107 native English speakers reading machine-generated texts.
EMTeC entails the eye movement data at all stages of pre-processing, i.e., the raw coordinate data sampled at 2000 Hz, the fixation sequences, and the reading measures.
arXiv Detail & Related papers (2024-08-08T08:00:45Z) - LLM-DetectAIve: a Tool for Fine-Grained Machine-Generated Text Detection [87.43727192273772]
It is often hard to tell whether a piece of text was human-written or machine-generated.
We present LLM-DetectAIve, designed for fine-grained detection.
It supports four categories: (i) human-written, (ii) machine-generated, (iii) machine-written, then machine-humanized, and (iv) human-written, then machine-polished.
arXiv Detail & Related papers (2024-08-08T07:43:17Z) - Grammatical vs Spelling Error Correction: An Investigation into the Responsiveness of Transformer-based Language Models using BART and MarianMT [0.0]
This project aims at analyzing different kinds of error that occurs in text documents.
The work employs two of the advanced deep neural network-based language models, namely, BART and MarianMT, to rectify the anomalies present in the text.
arXiv Detail & Related papers (2024-03-25T11:45:21Z) - Neural Automated Writing Evaluation with Corrective Feedback [4.0230668961961085]
We propose an integrated system for automated writing evaluation with corrective feedback.
This system enables language learners to simulate the essay writing tests.
It would also alleviate the burden of manually correcting innumerable essays.
arXiv Detail & Related papers (2024-02-27T15:42:33Z) - A Methodology for Generative Spelling Correction via Natural Spelling
Errors Emulation across Multiple Domains and Languages [39.75847219395984]
We present a methodology for generative spelling correction (SC), which was tested on English and Russian languages.
We study the ways those errors can be emulated in correct sentences to effectively enrich generative models' pre-train procedure.
As a practical outcome of our work, we introduce SAGE(Spell checking via Augmentation and Generative distribution Emulation)
arXiv Detail & Related papers (2023-08-18T10:07:28Z) - Learning a Grammar Inducer from Massive Uncurated Instructional Videos [118.7279072358029]
Video-aided grammar induction aims to leverage video information for finding more accurate syntactic grammars for accompanying text.
We build a new model that can better learn video-span correlation without manually designed features.
Our model yields higher F1 scores than the previous state-of-the-art systems trained on in-domain data.
arXiv Detail & Related papers (2022-10-22T00:22:55Z) - Detecting Text Formality: A Study of Text Classification Approaches [78.11745751651708]
This work proposes the first to our knowledge systematic study of formality detection methods based on statistical, neural-based, and Transformer-based machine learning methods.
We conducted three types of experiments -- monolingual, multilingual, and cross-lingual.
The study shows the overcome of Char BiLSTM model over Transformer-based ones for the monolingual and multilingual formality classification task.
arXiv Detail & Related papers (2022-04-19T16:23:07Z) - Lexically Aware Semi-Supervised Learning for OCR Post-Correction [90.54336622024299]
Much of the existing linguistic data in many languages of the world is locked away in non-digitized books and documents.
Previous work has demonstrated the utility of neural post-correction methods on recognition of less-well-resourced languages.
We present a semi-supervised learning method that makes it possible to utilize raw images to improve performance.
arXiv Detail & Related papers (2021-11-04T04:39:02Z) - Scarecrow: A Framework for Scrutinizing Machine Text [69.26985439191151]
We introduce a new structured, crowdsourced error annotation schema called Scarecrow.
Scarecrow collects 13k annotations of 1.3k human and machine generate paragraphs of English language news text.
These findings demonstrate the value of Scarecrow annotations in the assessment of current and future text generation systems.
arXiv Detail & Related papers (2021-07-02T22:37:03Z) - Misspelling Correction with Pre-trained Contextual Language Model [0.0]
We present two experiments, based on BERT and the edit distance algorithm, for ranking and selecting candidate corrections.
The results of our experiments demonstrated that when combined properly, contextual word embeddings of BERT and edit distance are capable of effectively correcting spelling errors.
arXiv Detail & Related papers (2021-01-08T20:11:01Z) - Abstractive Summarization of Spoken and Written Instructions with BERT [66.14755043607776]
We present the first application of the BERTSum model to conversational language.
We generate abstractive summaries of narrated instructional videos across a wide variety of topics.
We envision this integrated as a feature in intelligent virtual assistants, enabling them to summarize both written and spoken instructional content upon request.
arXiv Detail & Related papers (2020-08-21T20:59:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.