Edit Distances and Their Applications to Downstream Tasks in Research and Commercial Contexts
- URL: http://arxiv.org/abs/2410.05881v1
- Date: Tue, 8 Oct 2024 10:21:22 GMT
- Title: Edit Distances and Their Applications to Downstream Tasks in Research and Commercial Contexts
- Authors: FĂ©lix do Carmo, Diptesh Kanojia,
- Abstract summary: The tutorial describes the concept of edit distances applied to research and commercial contexts.
We use Translation Edit Rate (TER), Levenshtein, Damerau-Levenshtein, Longest Common Subsequence and $n$-gram distances to demonstrate the frailty of statistical metrics when comparing text sequences.
- Score: 7.629053304626553
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The tutorial describes the concept of edit distances applied to research and commercial contexts. We use Translation Edit Rate (TER), Levenshtein, Damerau-Levenshtein, Longest Common Subsequence and $n$-gram distances to demonstrate the frailty of statistical metrics when comparing text sequences. Our discussion disassembles them into their essential components. We discuss the centrality of four editing actions: insert, delete, replace and move words, and show their implementations in openly available packages and toolkits. The application of edit distances in downstream tasks often assumes that these accurately represent work done by post-editors and real errors that need to be corrected in MT output. We discuss how imperfect edit distances are in capturing the details of this error correction work and the implications for researchers and for commercial applications, of these uses of edit distances. In terms of commercial applications, we discuss their integration in computer-assisted translation tools and how the perception of the connection between edit distances and post-editor effort affects the definition of translator rates.
Related papers
- Bridging the Editing Gap in LLMs: FineEdit for Precise and Targeted Text Modifications [9.795246551841586]
Large Language Models (LLMs) have transformed natural language processing, yet they still struggle with direct text editing tasks.
In this work, we introduce a dual approach to enhance LLM editing performance.
First, we present InstrEditBench, a high-quality benchmark dataset comprising over 20,000 structured editing tasks.
Second, we propose FineEdit, a specialized model trained on this curated benchmark.
arXiv Detail & Related papers (2025-02-19T01:41:44Z) - Assessing Human Editing Effort on LLM-Generated Texts via Compression-Based Edit Distance [2.1792283995628465]
Existing edit distance metrics, such as Levenshtein, BLEU, ROUGE, and TER, often fail to accurately measure the effort required for post-editing.
We introduce a novel compression-based edit distance metric grounded in the Lempel-Ziv-77 algorithm.
arXiv Detail & Related papers (2024-12-23T06:29:25Z) - On the Robustness of Editing Large Language Models [57.477943944826904]
Large language models (LLMs) have played a pivotal role in building communicative AI, yet they encounter the challenge of efficient updates.
This work seeks to understand the strengths and limitations of editing methods, facilitating practical applications of communicative AI.
arXiv Detail & Related papers (2024-02-08T17:06:45Z) - Machine Translation Meta Evaluation through Translation Accuracy
Challenge Sets [92.38654521870444]
We introduce ACES, a contrastive challenge set spanning 146 language pairs.
This dataset aims to discover whether metrics can identify 68 translation accuracy errors.
We conduct a large-scale study by benchmarking ACES on 50 metrics submitted to the WMT 2022 and 2023 metrics shared tasks.
arXiv Detail & Related papers (2024-01-29T17:17:42Z) - DUnE: Dataset for Unified Editing [3.7346004746366384]
We introduce DUnE-an editing benchmark where edits are natural language sentences.
We show that retrieval-augmented language modeling can outperform specialized editing techniques.
arXiv Detail & Related papers (2023-11-27T18:56:14Z) - Beyond the Chat: Executable and Verifiable Text-Editing with LLMs [87.84199761550634]
Conversational interfaces powered by Large Language Models (LLMs) have recently become a popular way to obtain feedback during document editing.
We present InkSync, an editing interface that suggests executable edits directly within the document being edited.
arXiv Detail & Related papers (2023-09-27T00:56:17Z) - Reducing Sequence Length by Predicting Edit Operations with Large
Language Models [50.66922361766939]
This paper proposes predicting edit spans for the source text for local sequence transduction tasks.
We apply instruction tuning for Large Language Models on the supervision data of edit spans.
Experiments show that the proposed method achieves comparable performance to the baseline in four tasks.
arXiv Detail & Related papers (2023-05-19T17:51:05Z) - Towards Counterfactual Image Manipulation via CLIP [106.94502632502194]
Existing methods can achieve realistic editing of different visual attributes such as age and gender of facial images.
We investigate this problem in a text-driven manner with Contrastive-Language-Image-Pretraining (CLIP)
We design a novel contrastive loss that exploits predefined CLIP-space directions to guide the editing toward desired directions from different perspectives.
arXiv Detail & Related papers (2022-07-06T17:02:25Z) - Recurrent Inference in Text Editing [6.4689151804633775]
We propose a new inference method, Recurrence, that iteratively performs editing actions, significantly narrowing the problem space.
In each iteration, encoding the partially edited text, Recurrence decodes the latent representation, generates an action of short, fixed-length, and applies the action to complete a single edit.
For a comprehensive comparison, we introduce three types of text editing tasks: Arithmetic Operators Restoration (AOR), Arithmetic Equation Simplification (AES), Arithmetic Equation Correction (AEC)
arXiv Detail & Related papers (2020-09-26T17:06:29Z) - Towards Minimal Supervision BERT-based Grammar Error Correction [81.90356787324481]
We try to incorporate contextual information from pre-trained language model to leverage annotation and benefit multilingual scenarios.
Results show strong potential of Bidirectional Representations from Transformers (BERT) in grammatical error correction task.
arXiv Detail & Related papers (2020-01-10T15:45:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.