Related papers: Grammatical vs Spelling Error Correction: An Investigation into the Responsiveness of Transformer-based Language Models using BART and MarianMT

Grammatical vs Spelling Error Correction: An Investigation into the Responsiveness of Transformer-based Language Models using BART and MarianMT

URL: http://arxiv.org/abs/2403.16655v1
Date: Mon, 25 Mar 2024 11:45:21 GMT
Title: Grammatical vs Spelling Error Correction: An Investigation into the Responsiveness of Transformer-based Language Models using BART and MarianMT
Authors: Rohit Raju, Peeta Basa Pati, SA Gandheesh, Gayatri Sanjana Sannala, Suriya KS,
Abstract summary: This project aims at analyzing different kinds of error that occurs in text documents. The work employs two of the advanced deep neural network-based language models, namely, BART and MarianMT, to rectify the anomalies present in the text.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Text continues to remain a relevant form of representation for information. Text documents are created either in digital native platforms or through the conversion of other media files such as images and speech. While the digital native text is invariably obtained through physical or virtual keyboards, technologies such as OCR and speech recognition are utilized to transform the images and speech signals into text content. All these variety of mechanisms of text generation also introduce errors into the captured text. This project aims at analyzing different kinds of error that occurs in text documents. The work employs two of the advanced deep neural network-based language models, namely, BART and MarianMT, to rectify the anomalies present in the text. Transfer learning of these models with available dataset is performed to finetune their capacity for error correction. A comparative study is conducted to investigate the effectiveness of these models in handling each of the defined error categories. It is observed that while both models can bring down the erroneous sentences by 20+%, BART can handle spelling errors far better (24.6%) than grammatical errors (8.8%).

Related papers

Automatic Correction of Writing Anomalies in Hausa Texts [0.0]
Hausa texts are often characterized by writing anomalies such as incorrect character substitutions and spacing errors.<n>This paper presents an approach to automatically correct the anomalies by finetuning transformer-based models.
arXiv Detail & Related papers (2025-06-04T10:46:19Z)
TextInVision: Text and Prompt Complexity Driven Visual Text Generation Benchmark [61.412934963260724]
Existing diffusion-based text-to-image models often struggle to accurately embed text within images. We introduce TextInVision, a large-scale, text and prompt complexity driven benchmark to evaluate the ability of diffusion models to integrate visual text into images.
arXiv Detail & Related papers (2025-03-17T21:36:31Z)
Full-text Error Correction for Chinese Speech Recognition with Large Language Model [11.287933170894311]
Large Language Models (LLMs) have demonstrated substantial potential for error correction in Automatic Speech Recognition (ASR) This paper investigates the effectiveness of LLMs for error correction in full-text generated by ASR systems from longer speech recordings.
arXiv Detail & Related papers (2024-09-12T06:50:45Z)
A Comprehensive Approach to Misspelling Correction with BERT and Levenshtein Distance [1.7000578646860536]
Spelling mistakes, among the most prevalent writing errors, are frequently encountered due to various factors. This research aims to identify and rectify diverse spelling errors in text using neural networks.
arXiv Detail & Related papers (2024-07-24T16:07:11Z)
Translatotron-V(ison): An End-to-End Model for In-Image Machine Translation [81.45400849638347]
In-image machine translation (IIMT) aims to translate an image containing texts in source language into an image containing translations in target language. In this paper, we propose an end-to-end IIMT model consisting of four modules. Our model achieves competitive performance compared to cascaded models with only 70.9% of parameters, and significantly outperforms the pixel-level end-to-end IIMT model.
arXiv Detail & Related papers (2024-07-03T08:15:39Z)
Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart Captioning [90.13978453378768]
We introduce a comprehensive typology of factual errors in generated chart captions. A large-scale human annotation effort provides insight into the error patterns and frequencies in captions crafted by various chart captioning models. Our analysis reveals that even state-of-the-art models, including GPT-4V, frequently produce captions laced with factual inaccuracies.
arXiv Detail & Related papers (2023-12-15T19:16:21Z)
A Methodology for Generative Spelling Correction via Natural Spelling Errors Emulation across Multiple Domains and Languages [39.75847219395984]
We present a methodology for generative spelling correction (SC), which was tested on English and Russian languages. We study the ways those errors can be emulated in correct sentences to effectively enrich generative models' pre-train procedure. As a practical outcome of our work, we introduce SAGE(Spell checking via Augmentation and Generative distribution Emulation)
arXiv Detail & Related papers (2023-08-18T10:07:28Z)
Reading and Writing: Discriminative and Generative Modeling for Self-Supervised Text Recognition [101.60244147302197]
We introduce contrastive learning and masked image modeling to learn discrimination and generation of text images. Our method outperforms previous self-supervised text recognition methods by 10.2%-20.2% on irregular scene text recognition datasets. Our proposed text recognizer exceeds previous state-of-the-art text recognition methods by averagely 5.3% on 11 benchmarks, with similar model size.
arXiv Detail & Related papers (2022-07-01T03:50:26Z)
Detecting Text Formality: A Study of Text Classification Approaches [78.11745751651708]
This work proposes the first to our knowledge systematic study of formality detection methods based on statistical, neural-based, and Transformer-based machine learning methods. We conducted three types of experiments -- monolingual, multilingual, and cross-lingual. The study shows the overcome of Char BiLSTM model over Transformer-based ones for the monolingual and multilingual formality classification task.
arXiv Detail & Related papers (2022-04-19T16:23:07Z)
Language Matters: A Weakly Supervised Pre-training Approach for Scene Text Detection and Spotting [69.77701325270047]
This paper presents a weakly supervised pre-training method that can acquire effective scene text representations. Our network consists of an image encoder and a character-aware text encoder that extract visual and textual features. Experiments show that our pre-trained model improves F-score by +2.5% and +4.8% while transferring its weights to other text detection and spotting networks.
arXiv Detail & Related papers (2022-03-08T08:10:45Z)
A Proposal of Automatic Error Correction in Text [0.0]
It is shown an application of automatic recognition and correction of ortographic errors in electronic texts. The proposal is based in part of speech text categorization, word similarity, word diccionaries, statistical measures, morphologic analisys and n-grams based language model of Spanish.
arXiv Detail & Related papers (2021-09-24T17:17:56Z)
On the Robustness of Language Encoders against Grammatical Errors [66.05648604987479]
We collect real grammatical errors from non-native speakers and conduct adversarial attacks to simulate these errors on clean text data. Results confirm that the performance of all tested models is affected but the degree of impact varies.
arXiv Detail & Related papers (2020-05-12T11:01:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.