Related papers: A Decoding Algorithm for Length-Control Summarization Based on Directed Acyclic Transformers

A Decoding Algorithm for Length-Control Summarization Based on Directed Acyclic Transformers

URL: http://arxiv.org/abs/2502.04535v1
Date: Thu, 06 Feb 2025 22:12:55 GMT
Title: A Decoding Algorithm for Length-Control Summarization Based on Directed Acyclic Transformers
Authors: Chenyang Huang, Hao Zhou, Cameron Jen, Kangjie Zheng, Osmar R. Zaïane, Lili Mou,
Abstract summary: Length-control summarization aims to condense long texts into a short one within a certain length limit.<n>Previous approaches often use autoregressive (AR) models and treat the length requirement as a soft constraint.<n>Our approach allows for multiple plausible sequence fragments and predicts a emphpath to connect them.
Score: 32.53051395472311
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Length-control summarization aims to condense long texts into a short one within a certain length limit. Previous approaches often use autoregressive (AR) models and treat the length requirement as a soft constraint, which may not always be satisfied. In this study, we propose a novel length-control decoding algorithm based on the Directed Acyclic Transformer (DAT). Our approach allows for multiple plausible sequence fragments and predicts a \emph{path} to connect them. In addition, we propose a Sequence Maximum a Posteriori (SeqMAP) decoding algorithm that marginalizes different possible paths and finds the most probable summary satisfying the length budget. Our algorithm is based on beam search, which further facilitates a reranker for performance improvement. Experimental results on the Gigaword and DUC2004 datasets demonstrate our state-of-the-art performance for length-control summarization.

Related papers

Efficient Long Context Language Model Retrieval with Compression [57.09163579304332]
Long Context Language Models (LCLMs) have emerged as a new paradigm to perform Information Retrieval (IR)<n>We propose a new compression approach tailored for LCLM retrieval, which is trained to maximize the retrieval performance while minimizing the length of the compressed passages.<n>We show that CoLoR improves the retrieval performance by 6% while compressing the in-context size by a factor of 1.91.
arXiv Detail & Related papers (2024-12-24T07:30:55Z)
Length Controlled Generation for Black-box LLMs [70.57649832433451]
Large language models (LLMs) have demonstrated impressive instruction following capabilities, but struggle to accurately manage the length of generated text.<n>We propose a novel iterative sampling framework for text length control, integrating the Metropolis-Hastings algorithm with an importance sampling acceleration strategy.<n>Our framework achieves almost 100% success rates of length control on Llama3.1 for tasks such as length-controlled abstractive summarization.
arXiv Detail & Related papers (2024-12-19T09:07:38Z)
Universality and Limitations of Prompt Tuning [65.8354898840308]
We take one of the first steps to understand the role of soft-prompt tuning for transformer-based architectures. We analyze prompt tuning from the lens of universality and limitations with finite-depth pretrained transformers for continuous-valued functions. Our result guarantees the existence of a strong transformer with a prompt to approximate any sequence-to-sequence function in the set of Lipschitz functions.
arXiv Detail & Related papers (2023-05-30T06:47:07Z)
Summarization with Precise Length Control [23.688834410051]
We present a framework to generate summaries with precisely the specified number of tokens or sentences. We jointly train the models to predict the lengths, so our model can generate summaries with optimal length.
arXiv Detail & Related papers (2023-05-09T04:45:24Z)
Machine Learning-Aided Efficient Decoding of Reed-Muller Subcodes [59.55193427277134]
Reed-Muller (RM) codes achieve the capacity of general binary-input memoryless symmetric channels. RM codes only admit limited sets of rates. Efficient decoders are available for RM codes at finite lengths.
arXiv Detail & Related papers (2023-01-16T04:11:14Z)
A Character-Level Length-Control Algorithm for Non-Autoregressive Sentence Summarization [23.495225374478295]
Sentence summarization aims at compressing a long sentence into a short one that keeps the main gist, and has extensive real-world applications such as headline generation. In our work, we address a new problem of explicit character-level length control for summarization, and propose a dynamic programming algorithm based on the Connectionist Temporal Classification (CTC) model.
arXiv Detail & Related papers (2022-05-28T21:09:53Z)
A Call for Clarity in Beam Search: How It Works and When It Stops [125.55175954381991]
We introduce a patience factor, a simple modification to this beam decoding implementation, that generalizes the stopping criterion and provides flexibility to the depth of search. Empirical results demonstrate that adjusting this patience factor improves decoding performance of strong pretrained models on news text summarization and machine translation over diverse language pairs.
arXiv Detail & Related papers (2022-04-11T22:03:44Z)
LenAtten: An Effective Length Controlling Unit For Text Summarization [5.554982420311913]
Fixed length summarization aims at generating summaries with a preset number of words or characters. Most recent researches incorporate length information with word embeddings as the input to the recurrent decoding unit. We present an effective length controlling unit Length Attention (LenAtten) to break this trade-off.
arXiv Detail & Related papers (2021-06-01T08:45:41Z)
Length-Adaptive Transformer: Train Once with Length Drop, Use Anytime with Search [84.94597821711808]
We extend PoWER-BERT (Goyal et al., 2020) and propose Length-Adaptive Transformer that can be used for various inference scenarios after one-shot training. We conduct a multi-objective evolutionary search to find a length configuration that maximizes the accuracy and minimizes the efficiency metric under any given computational budget. We empirically verify the utility of the proposed approach by demonstrating the superior accuracy-efficiency trade-off under various setups.
arXiv Detail & Related papers (2020-10-14T12:28:08Z)
Conformer-Kernel with Query Term Independence for Document Retrieval [32.36908635150144]
The Transformer- Kernel (TK) model has demonstrated strong reranking performance on the TREC Deep Learning benchmark. We extend the TK architecture to the full retrieval setting by incorporating the query term independence assumption. We show that the Conformer's GPU memory requirement scales linearly with input sequence length, making it a more viable option when ranking long documents.
arXiv Detail & Related papers (2020-07-20T19:47:28Z)
SEAL: Segment-wise Extractive-Abstractive Long-form Text Summarization [39.85688193525843]
We study a sequence-to-sequence setting with input sequence lengths up to 100,000 tokens and output sequence lengths up to 768 tokens. We propose SEAL, a Transformer-based model, featuring a new encoder-decoder attention that dynamically extracts/selects input snippets to sparsely attend to for each output segment. The SEAL model achieves state-of-the-art results on existing long-form summarization tasks, and outperforms strong baseline models on a new dataset/task we introduce, Search2Wiki, with much longer input text.
arXiv Detail & Related papers (2020-06-18T00:13:21Z)
Non-Adaptive Adaptive Sampling on Turnstile Streams [57.619901304728366]
We give the first relative-error algorithms for column subset selection, subspace approximation, projective clustering, and volume on turnstile streams that use space sublinear in $n$. Our adaptive sampling procedure has a number of applications to various data summarization problems that either improve state-of-the-art or have only been previously studied in the more relaxed row-arrival model.
arXiv Detail & Related papers (2020-04-23T05:00:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.