Related papers: Hierarchical Context Tagging for Utterance Rewriting

Hierarchical Context Tagging for Utterance Rewriting

URL: http://arxiv.org/abs/2206.11218v1
Date: Wed, 22 Jun 2022 17:09:34 GMT
Title: Hierarchical Context Tagging for Utterance Rewriting
Authors: Lisa Jin, Linfeng Song, Lifeng Jin, Dong Yu, Daniel Gildea
Abstract summary: Methods that tag rather than linearly generate sequences have proven stronger in both in- and out-of-domain rewriting settings. We propose a hierarchical context tagger that mitigates this issue by predicting slotted rules. Experiments on several benchmarks show that HCT can outperform state-of-the-art rewriting systems by 2 BLEU points.
Score: 51.251400047377324
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Utterance rewriting aims to recover coreferences and omitted information from the latest turn of a multi-turn dialogue. Recently, methods that tag rather than linearly generate sequences have proven stronger in both in- and out-of-domain rewriting settings. This is due to a tagger's smaller search space as it can only copy tokens from the dialogue context. However, these methods may suffer from low coverage when phrases that must be added to a source utterance cannot be covered by a single context span. This can occur in languages like English that introduce tokens such as prepositions into the rewrite for grammaticality. We propose a hierarchical context tagger (HCT) that mitigates this issue by predicting slotted rules (e.g., "besides_") whose slots are later filled with context spans. HCT (i) tags the source string with token-level edit actions and slotted rules and (ii) fills in the resulting rule slots with spans from the dialogue context. This rule tagging allows HCT to add out-of-context tokens and multiple spans at once; we further cluster the rules to truncate the long tail of the rule distribution. Experiments on several benchmarks show that HCT can outperform state-of-the-art rewriting systems by ~2 BLEU points.

Related papers

Paragraph Segmentation Revisited: Towards a Standard Task for Structuring Speech [61.00008468914252]
We recast paragraph segmentation as the missing structuring step and fill three gaps at the intersection of speech processing and text segmentation.<n> benchmarks focus on the underexplored speech domain, where paragraph segmentation has traditionally not been part of post-processing.<n>Second, we propose a constrained-decoding formulation that lets large language models insert paragraph breaks while preserving the original transcript.<n>Third, we show that a compact model (MiniSeg) attains state-of-the-art accuracy and, when extended hierarchically, jointly predicts chapters and paragraphs with minimal computational cost.
arXiv Detail & Related papers (2025-12-30T23:29:51Z)
Levée d'ambiguïtés par grammaires locales [0.0]
This article concerns a lexical disambiguation method adapted to the objective of a zero silence rate and implemented in Silberztein's INTEX system (1993).<n>We show that to verify a local disambiguation grammar in this framework, it is not sufficient to consider the transducer paths separately.
arXiv Detail & Related papers (2025-10-28T15:38:22Z)
Think Before You Attribute: Improving the Performance of LLMs Attribution Systems [2.527698260421756]
We propose a sentence-level pre-attribution step for Retrieve-Augmented Generation (RAG) systems.<n>By separating sentences before attribution, a proper attribution method can be selected for the type of sentence, or the attribution can be skipped altogether.
arXiv Detail & Related papers (2025-05-19T02:08:20Z)
Detecting Document-level Paraphrased Machine Generated Content: Mimicking Human Writing Style and Involving Discourse Features [57.34477506004105]
Machine-generated content poses challenges such as academic plagiarism and the spread of misinformation. We introduce novel methodologies and datasets to overcome these challenges. We propose MhBART, an encoder-decoder model designed to emulate human writing style. We also propose DTransformer, a model that integrates discourse analysis through PDTB preprocessing to encode structural features.
arXiv Detail & Related papers (2024-12-17T08:47:41Z)
SVTRv2: CTC Beats Encoder-Decoder Models in Scene Text Recognition [77.28814034644287]
We propose SVTRv2, a CTC model endowed with the ability to handle text irregularities and model linguistic context.<n>We extensively evaluate SVTRv2 in both standard and recent challenging benchmarks.<n> SVTRv2 surpasses most EDTRs across the scenarios in terms of accuracy and inference speed.
arXiv Detail & Related papers (2024-11-24T14:21:35Z)
Partial Scene Text Retrieval [56.14891109413448]
The task of partial scene text retrieval involves localizing and searching for text instances that are the same or similar to a given query text from an image gallery. Existing methods can only handle text-line instances, leaving the problem of searching for partial patches unsolved. We propose a network that can simultaneously retrieve both text-line instances and their partial patches.
arXiv Detail & Related papers (2024-11-15T15:08:04Z)
Contextualized Automatic Speech Recognition with Dynamic Vocabulary [41.892863381787684]
This paper proposes a dynamic vocabulary where bias tokens can be added during inference. Each entry in a bias list is represented as a single token, unlike a sequence of existing subword tokens. Experimental results demonstrate that the proposed method improves the bias phrase WER on English and Japanese datasets by 3.1 -- 4.9 points.
arXiv Detail & Related papers (2024-05-22T05:03:39Z)
A General and Flexible Multi-concept Parsing Framework for Multilingual Semantic Matching [60.51839859852572]
We propose to resolve the text into multi concepts for multilingual semantic matching to liberate the model from the reliance on NER models. We conduct comprehensive experiments on English datasets QQP and MRPC, and Chinese dataset Medical-SM.
arXiv Detail & Related papers (2024-03-05T13:55:16Z)
Improving Cross-Lingual Transfer through Subtree-Aware Word Reordering [17.166996956587155]
One obstacle for effective cross-lingual transfer is variability in word-order patterns. We present a new powerful reordering method, defined in terms of Universal Dependencies. We show that our method consistently outperforms strong baselines over different language pairs and model architectures.
arXiv Detail & Related papers (2023-10-20T15:25:53Z)
Integrating Bidirectional Long Short-Term Memory with Subword Embedding for Authorship Attribution [2.3429306644730854]
Manifold word-based stylistic markers have been successfully used in deep learning methods to deal with the intrinsic problem of authorship attribution. The proposed method was experimentally evaluated against numerous state-of-the-art methods across the public corporal of CCAT50, IMDb62, Blog50, and Twitter50.
arXiv Detail & Related papers (2023-06-26T11:35:47Z)
Lexicon-injected Semantic Parsing for Task-Oriented Dialog [31.42253032456493]
We present a novel lexicon- semanticinjected, which collects slot labels of tree representation as a lexicon and injects lexical features to the span representation of the tree nodes. Our best result produces a new state-of-the-art result (87.62%) on the TOP dataset, and demonstrates its adaptability to frequently updated slot lexicon entries in real task-oriented dialog.
arXiv Detail & Related papers (2022-11-26T07:59:20Z)
Tracing Text Provenance via Context-Aware Lexical Substitution [81.49359106648735]
We propose a natural language watermarking scheme based on context-aware lexical substitution. Under both objective and subjective metrics, our watermarking scheme can well preserve the semantic integrity of original sentences.
arXiv Detail & Related papers (2021-12-15T04:27:33Z)
UCPhrase: Unsupervised Context-aware Quality Phrase Tagging [63.86606855524567]
UCPhrase is a novel unsupervised context-aware quality phrase tagger. We induce high-quality phrase spans as silver labels from consistently co-occurring word sequences. We show that our design is superior to state-of-the-art pre-trained, unsupervised, and distantly supervised methods.
arXiv Detail & Related papers (2021-05-28T19:44:24Z)
Beyond Offline Mapping: Learning Cross Lingual Word Embeddings through Context Anchoring [41.77270308094212]
We propose an alternative mapping approach for word embeddings in languages other than English. Rather than aligning two fixed embedding spaces, our method works by fixing the target language embeddings, and learning a new set of embeddings for the source language that are aligned with them. Our approach outperforms conventional mapping methods on bilingual lexicon induction, and obtains competitive results in the downstream XNLI task.
arXiv Detail & Related papers (2020-12-31T17:10:14Z)
Context-Based Quotation Recommendation [60.93257124507105]
We propose a novel context-aware quote recommendation system. It generates a ranked list of quotable paragraphs and spans of tokens from a given source document. We conduct experiments on a collection of speech transcripts and associated news articles.
arXiv Detail & Related papers (2020-05-17T17:49:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.