Related papers: Edit Distances and Their Applications to Downstream Tasks in Research and Commercial Contexts

Edit Distances and Their Applications to Downstream Tasks in Research and Commercial Contexts

URL: http://arxiv.org/abs/2410.05881v1
Date: Tue, 8 Oct 2024 10:21:22 GMT
Title: Edit Distances and Their Applications to Downstream Tasks in Research and Commercial Contexts
Authors: Félix do Carmo, Diptesh Kanojia,
Abstract summary: The tutorial describes the concept of edit distances applied to research and commercial contexts. We use Translation Edit Rate (TER), Levenshtein, Damerau-Levenshtein, Longest Common Subsequence and $n$-gram distances to demonstrate the frailty of statistical metrics when comparing text sequences.
Score: 7.629053304626553
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The tutorial describes the concept of edit distances applied to research and commercial contexts. We use Translation Edit Rate (TER), Levenshtein, Damerau-Levenshtein, Longest Common Subsequence and $n$-gram distances to demonstrate the frailty of statistical metrics when comparing text sequences. Our discussion disassembles them into their essential components. We discuss the centrality of four editing actions: insert, delete, replace and move words, and show their implementations in openly available packages and toolkits. The application of edit distances in downstream tasks often assumes that these accurately represent work done by post-editors and real errors that need to be corrected in MT output. We discuss how imperfect edit distances are in capturing the details of this error correction work and the implications for researchers and for commercial applications, of these uses of edit distances. In terms of commercial applications, we discuss their integration in computer-assisted translation tools and how the perception of the connection between edit distances and post-editor effort affects the definition of translator rates.

Related papers

Concept Lancet: Image Editing with Compositional Representation Transplant [58.9421919837084]
Concept Lancet is a zero-shot plug-and-play framework for principled representation manipulation in image editing. We decompose the source input in the latent (text embedding or diffusion score) space as a sparse linear combination of the representations of the collected visual concepts. We perform a customized concept transplant process to impose the corresponding editing direction.
arXiv Detail & Related papers (2025-04-03T17:59:58Z)
Bridging the Editing Gap in LLMs: FineEdit for Precise and Targeted Text Modifications [9.795246551841586]
Large Language Models (LLMs) have transformed natural language processing, yet they still struggle with direct text editing tasks. In this work, we introduce a dual approach to enhance LLM editing performance. First, we present InstrEditBench, a high-quality benchmark dataset comprising over 20,000 structured editing tasks. Second, we propose FineEdit, a specialized model trained on this curated benchmark.
arXiv Detail & Related papers (2025-02-19T01:41:44Z)
Assessing Human Editing Effort on LLM-Generated Texts via Compression-Based Edit Distance [2.1792283995628465]
Existing edit distance metrics, such as Levenshtein, BLEU, ROUGE, and TER, often fail to accurately measure the effort required for post-editing. We introduce a novel compression-based edit distance metric grounded in the Lempel-Ziv-77 algorithm.
arXiv Detail & Related papers (2024-12-23T06:29:25Z)
EXCGEC: A Benchmark of Edit-wise Explainable Chinese Grammatical Error Correction [21.869368698234247]
This paper introduces the task of EXplainable GEC (EXGEC), which focuses on the integral role of both correction and explanation tasks. We propose EXCGEC, a tailored benchmark for Chinese EXGEC consisting of 8,216 explanation-augmented samples.
arXiv Detail & Related papers (2024-07-01T03:06:41Z)
On the Robustness of Editing Large Language Models [57.477943944826904]
Large language models (LLMs) have played a pivotal role in building communicative AI, yet they encounter the challenge of efficient updates. This work seeks to understand the strengths and limitations of editing methods, facilitating practical applications of communicative AI.
arXiv Detail & Related papers (2024-02-08T17:06:45Z)
Machine Translation Meta Evaluation through Translation Accuracy Challenge Sets [92.38654521870444]
We introduce ACES, a contrastive challenge set spanning 146 language pairs. This dataset aims to discover whether metrics can identify 68 translation accuracy errors. We conduct a large-scale study by benchmarking ACES on 50 metrics submitted to the WMT 2022 and 2023 metrics shared tasks.
arXiv Detail & Related papers (2024-01-29T17:17:42Z)
DUnE: Dataset for Unified Editing [3.7346004746366384]
We introduce DUnE-an editing benchmark where edits are natural language sentences. We show that retrieval-augmented language modeling can outperform specialized editing techniques.
arXiv Detail & Related papers (2023-11-27T18:56:14Z)
Contextual Refinement of Translations: Large Language Models for Sentence and Document-Level Post-Editing [12.843274390224853]
Large Language Models (LLM's) have demonstrated considerable success in various Natural Language Processing tasks. We show that they have yet to attain state-of-the-art performance in Neural Machine Translation. We propose adapting LLM's as Automatic Post-Editors (APE) rather than direct translators.
arXiv Detail & Related papers (2023-10-23T12:22:15Z)
Beyond the Chat: Executable and Verifiable Text-Editing with LLMs [87.84199761550634]
Conversational interfaces powered by Large Language Models (LLMs) have recently become a popular way to obtain feedback during document editing. We present InkSync, an editing interface that suggests executable edits directly within the document being edited.
arXiv Detail & Related papers (2023-09-27T00:56:17Z)
Reducing Sequence Length by Predicting Edit Operations with Large Language Models [50.66922361766939]
This paper proposes predicting edit spans for the source text for local sequence transduction tasks. We apply instruction tuning for Large Language Models on the supervision data of edit spans. Experiments show that the proposed method achieves comparable performance to the baseline in four tasks.
arXiv Detail & Related papers (2023-05-19T17:51:05Z)
Towards Counterfactual Image Manipulation via CLIP [106.94502632502194]
Existing methods can achieve realistic editing of different visual attributes such as age and gender of facial images. We investigate this problem in a text-driven manner with Contrastive-Language-Image-Pretraining (CLIP) We design a novel contrastive loss that exploits predefined CLIP-space directions to guide the editing toward desired directions from different perspectives.
arXiv Detail & Related papers (2022-07-06T17:02:25Z)
Recurrent Inference in Text Editing [6.4689151804633775]
We propose a new inference method, Recurrence, that iteratively performs editing actions, significantly narrowing the problem space. In each iteration, encoding the partially edited text, Recurrence decodes the latent representation, generates an action of short, fixed-length, and applies the action to complete a single edit. For a comprehensive comparison, we introduce three types of text editing tasks: Arithmetic Operators Restoration (AOR), Arithmetic Equation Simplification (AES), Arithmetic Equation Correction (AEC)
arXiv Detail & Related papers (2020-09-26T17:06:29Z)
Towards Minimal Supervision BERT-based Grammar Error Correction [81.90356787324481]
We try to incorporate contextual information from pre-trained language model to leverage annotation and benefit multilingual scenarios. Results show strong potential of Bidirectional Representations from Transformers (BERT) in grammatical error correction task.
arXiv Detail & Related papers (2020-01-10T15:45:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.