Structured abbreviation expansion in context
- URL: http://arxiv.org/abs/2110.01140v1
- Date: Mon, 4 Oct 2021 01:22:43 GMT
- Title: Structured abbreviation expansion in context
- Authors: Kyle Gorman, Christo Kirov, Brian Roark, and Richard Sproat
- Abstract summary: We consider the task of reversing ad hoc abbreviations in context to recover normalized, expanded versions of abbreviated messages.
The problem is related to, but distinct from, spelling correction, in that ad hoc abbreviations are intentional and may involve substantial differences from the original words.
- Score: 12.000998471674649
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Ad hoc abbreviations are commonly found in informal communication channels
that favor shorter messages. We consider the task of reversing these
abbreviations in context to recover normalized, expanded versions of
abbreviated messages. The problem is related to, but distinct from, spelling
correction, in that ad hoc abbreviations are intentional and may involve
substantial differences from the original words. Ad hoc abbreviations are
productively generated on-the-fly, so they cannot be resolved solely by
dictionary lookup. We generate a large, open-source data set of ad hoc
abbreviations. This data is used to study abbreviation strategies and to
develop two strong baselines for abbreviation expansion
Related papers
- Automated Extraction of Acronym-Expansion Pairs from Scientific Papers [0.0]
This project addresses challenges posed by the widespread use of abbreviations and acronyms in digital texts.
We propose a novel method that combines document preprocessing, regular expressions, and a large language model to identify abbreviations and map them to their corresponding expansions.
arXiv Detail & Related papers (2024-12-02T04:05:49Z) - Evaluating and Improving ChatGPT-Based Expansion of Abbreviations [6.900119856872516]
We present the first empirical study on large language models (LLMs)-based abbreviation expansion.
Our evaluation results suggest that ChatGPT is substantially less accurate than the state-of-the-art approach.
In response to the first cause, we investigated the effect of various contexts and found surrounding source code is the best selection.
arXiv Detail & Related papers (2024-10-31T12:20:24Z) - CRAT: A Multi-Agent Framework for Causality-Enhanced Reflective and Retrieval-Augmented Translation with Large Language Models [59.8529196670565]
CRAT is a novel multi-agent translation framework that leverages RAG and causality-enhanced self-reflection to address translation challenges.
Our results show that CRAT significantly improves translation accuracy, particularly in handling context-sensitive terms and emerging vocabulary.
arXiv Detail & Related papers (2024-10-28T14:29:11Z) - Enriching Relation Extraction with OpenIE [70.52564277675056]
Relation extraction (RE) is a sub-discipline of information extraction (IE)
In this work, we explore how recent approaches for open information extraction (OpenIE) may help to improve the task of RE.
Our experiments over two annotated corpora, KnowledgeNet and FewRel, demonstrate the improved accuracy of our enriched models.
arXiv Detail & Related papers (2022-12-19T11:26:23Z) - Dealing with Abbreviations in the Slovenian Biographical Lexicon [2.0810096547938164]
Abbreviations present a significant challenge for NLP systems because they cause tokenization and out-of-vocabulary errors.
We propose a new method for addressing the problems caused by a high density of domain-specific abbreviations in a text.
arXiv Detail & Related papers (2022-11-04T13:09:02Z) - Hierarchical Context Tagging for Utterance Rewriting [51.251400047377324]
Methods that tag rather than linearly generate sequences have proven stronger in both in- and out-of-domain rewriting settings.
We propose a hierarchical context tagger that mitigates this issue by predicting slotted rules.
Experiments on several benchmarks show that HCT can outperform state-of-the-art rewriting systems by 2 BLEU points.
arXiv Detail & Related papers (2022-06-22T17:09:34Z) - Atypical lexical abbreviations identification in Russian medical texts [0.0]
We propose an efficient ML-based algorithm which allows to identify the abbreviations in Russian texts.
The method achieves ROC AUC score 0.926 and F1 score 0.706 which are confirmed as competitive.
arXiv Detail & Related papers (2022-06-04T13:16:08Z) - Context-Aware Abbreviation Expansion Using Large Language Models [16.52516727224014]
We propose a paradigm in which phrases are abbreviated aggressively as primarily word-initial letters.
Our approach is to expand the abbreviations into full-phrase options by leveraging conversation context.
arXiv Detail & Related papers (2022-05-08T03:02:53Z) - End-to-end contextual asr based on posterior distribution adaptation for
hybrid ctc/attention system [61.148549738631814]
End-to-end (E2E) speech recognition architectures assemble all components of traditional speech recognition system into a single model.
Although it simplifies ASR system, it introduces contextual ASR drawback: the E2E model has worse performance on utterances containing infrequent proper nouns.
We propose to add a contextual bias attention (CBA) module to attention based encoder decoder (AED) model to improve its ability of recognizing the contextual phrases.
arXiv Detail & Related papers (2022-02-18T03:26:02Z) - Counterfactual Interventions Reveal the Causal Effect of Relative Clause
Representations on Agreement Prediction [61.4913233397155]
We show that BERT uses information about RC spans during agreement prediction using the linguistically strategy.
We also found that counterfactual representations generated for a specific RC subtype influenced the number prediction in sentences with other RC subtypes, suggesting that information about RC boundaries was encoded abstractly in BERT's representation.
arXiv Detail & Related papers (2021-05-14T17:11:55Z) - What Does This Acronym Mean? Introducing a New Dataset for Acronym
Identification and Disambiguation [74.42107665213909]
Acronyms are the short forms of phrases that facilitate conveying lengthy sentences in documents and serve as one of the mainstays of writing.
Due to their importance, identifying acronyms and corresponding phrases (AI) and finding the correct meaning of each acronym (i.e., acronym disambiguation (AD)) are crucial for text understanding.
Despite the recent progress on this task, there are some limitations in the existing datasets which hinder further improvement.
arXiv Detail & Related papers (2020-10-28T00:12:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.