Hierarchical Context Tagging for Utterance Rewriting
- URL: http://arxiv.org/abs/2206.11218v1
- Date: Wed, 22 Jun 2022 17:09:34 GMT
- Title: Hierarchical Context Tagging for Utterance Rewriting
- Authors: Lisa Jin, Linfeng Song, Lifeng Jin, Dong Yu, Daniel Gildea
- Abstract summary: Methods that tag rather than linearly generate sequences have proven stronger in both in- and out-of-domain rewriting settings.
We propose a hierarchical context tagger that mitigates this issue by predicting slotted rules.
Experiments on several benchmarks show that HCT can outperform state-of-the-art rewriting systems by 2 BLEU points.
- Score: 51.251400047377324
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Utterance rewriting aims to recover coreferences and omitted information from
the latest turn of a multi-turn dialogue. Recently, methods that tag rather
than linearly generate sequences have proven stronger in both in- and
out-of-domain rewriting settings. This is due to a tagger's smaller search
space as it can only copy tokens from the dialogue context. However, these
methods may suffer from low coverage when phrases that must be added to a
source utterance cannot be covered by a single context span. This can occur in
languages like English that introduce tokens such as prepositions into the
rewrite for grammaticality. We propose a hierarchical context tagger (HCT) that
mitigates this issue by predicting slotted rules (e.g., "besides_") whose slots
are later filled with context spans. HCT (i) tags the source string with
token-level edit actions and slotted rules and (ii) fills in the resulting rule
slots with spans from the dialogue context. This rule tagging allows HCT to add
out-of-context tokens and multiple spans at once; we further cluster the rules
to truncate the long tail of the rule distribution. Experiments on several
benchmarks show that HCT can outperform state-of-the-art rewriting systems by
~2 BLEU points.
Related papers
- Out of Length Text Recognition with Sub-String Matching [54.63761108308825]
In this paper, we term this task Out of Length (OOL) text recognition.
We propose a novel method called OOL Text Recognition with sub-String Matching (SMTR)
SMTR comprises two cross-attention-based modules: one encodes a sub-string containing multiple characters into next and previous queries, and the other employs the queries to attend to the image features.
arXiv Detail & Related papers (2024-07-17T05:02:17Z) - Contextualized Automatic Speech Recognition with Dynamic Vocabulary [41.892863381787684]
This paper proposes a dynamic vocabulary where bias tokens can be added during inference.
Each entry in a bias list is represented as a single token, unlike a sequence of existing subword tokens.
Experimental results demonstrate that the proposed method improves the bias phrase WER on English and Japanese datasets by 3.1 -- 4.9 points.
arXiv Detail & Related papers (2024-05-22T05:03:39Z) - A General and Flexible Multi-concept Parsing Framework for Multilingual Semantic Matching [60.51839859852572]
We propose to resolve the text into multi concepts for multilingual semantic matching to liberate the model from the reliance on NER models.
We conduct comprehensive experiments on English datasets QQP and MRPC, and Chinese dataset Medical-SM.
arXiv Detail & Related papers (2024-03-05T13:55:16Z) - Improving Cross-Lingual Transfer through Subtree-Aware Word Reordering [17.166996956587155]
One obstacle for effective cross-lingual transfer is variability in word-order patterns.
We present a new powerful reordering method, defined in terms of Universal Dependencies.
We show that our method consistently outperforms strong baselines over different language pairs and model architectures.
arXiv Detail & Related papers (2023-10-20T15:25:53Z) - Integrating Bidirectional Long Short-Term Memory with Subword Embedding
for Authorship Attribution [2.3429306644730854]
Manifold word-based stylistic markers have been successfully used in deep learning methods to deal with the intrinsic problem of authorship attribution.
The proposed method was experimentally evaluated against numerous state-of-the-art methods across the public corporal of CCAT50, IMDb62, Blog50, and Twitter50.
arXiv Detail & Related papers (2023-06-26T11:35:47Z) - Lexicon-injected Semantic Parsing for Task-Oriented Dialog [31.42253032456493]
We present a novel lexicon- semanticinjected, which collects slot labels of tree representation as a lexicon and injects lexical features to the span representation of the tree nodes.
Our best result produces a new state-of-the-art result (87.62%) on the TOP dataset, and demonstrates its adaptability to frequently updated slot lexicon entries in real task-oriented dialog.
arXiv Detail & Related papers (2022-11-26T07:59:20Z) - Tracing Text Provenance via Context-Aware Lexical Substitution [81.49359106648735]
We propose a natural language watermarking scheme based on context-aware lexical substitution.
Under both objective and subjective metrics, our watermarking scheme can well preserve the semantic integrity of original sentences.
arXiv Detail & Related papers (2021-12-15T04:27:33Z) - UCPhrase: Unsupervised Context-aware Quality Phrase Tagging [63.86606855524567]
UCPhrase is a novel unsupervised context-aware quality phrase tagger.
We induce high-quality phrase spans as silver labels from consistently co-occurring word sequences.
We show that our design is superior to state-of-the-art pre-trained, unsupervised, and distantly supervised methods.
arXiv Detail & Related papers (2021-05-28T19:44:24Z) - Beyond Offline Mapping: Learning Cross Lingual Word Embeddings through
Context Anchoring [41.77270308094212]
We propose an alternative mapping approach for word embeddings in languages other than English.
Rather than aligning two fixed embedding spaces, our method works by fixing the target language embeddings, and learning a new set of embeddings for the source language that are aligned with them.
Our approach outperforms conventional mapping methods on bilingual lexicon induction, and obtains competitive results in the downstream XNLI task.
arXiv Detail & Related papers (2020-12-31T17:10:14Z) - Context-Based Quotation Recommendation [60.93257124507105]
We propose a novel context-aware quote recommendation system.
It generates a ranked list of quotable paragraphs and spans of tokens from a given source document.
We conduct experiments on a collection of speech transcripts and associated news articles.
arXiv Detail & Related papers (2020-05-17T17:49:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.