Evade the Trap of Mediocrity: Promoting Diversity and Novelty in Text
Generation via Concentrating Attention
- URL: http://arxiv.org/abs/2211.07164v1
- Date: Mon, 14 Nov 2022 07:53:16 GMT
- Title: Evade the Trap of Mediocrity: Promoting Diversity and Novelty in Text
Generation via Concentrating Attention
- Authors: Wenhao Li, Xiaoyuan Yi, Jinyi Hu, Maosong Sun, Xing Xie
- Abstract summary: Powerful Transformer architectures have proven superior in generating high-quality sentences.
In this work, we find that sparser attention values in Transformer could improve diversity.
We introduce a novel attention regularization loss to control the sharpness of the attention distribution.
- Score: 85.5379146125199
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recently, powerful Transformer architectures have proven superior in
generating high-quality sentences. Nevertheless, these models tend to produce
dull high-frequency phrases, severely hurting the diversity and novelty of
generated text. In this work, we dig into the intrinsic mechanism of this
problem and found that sparser attention values in Transformer could improve
diversity. To understand such a phenomenon, we first conduct both empirical and
theoretical analysis and then attribute it to representation degeneration
caused by the attentive mixture of the hidden states during training. We term
this process the Trap of Mediocrity. To escape from such a trap, we introduce a
novel attention regularization loss to control the sharpness of the attention
distribution, which is transparent to model structures and can be easily
implemented within 20 lines of python code. We prove that this method could be
mathematically regarded as learning a Bayesian approximation of posterior
attention. Experiments show that our method improved the diversity and novelty
of the generated text while maintaining comparable quality on a variety of
conditional and unconditional generation tasks.
Related papers
- DiffDis: Empowering Generative Diffusion Model with Cross-Modal
Discrimination Capability [75.9781362556431]
We propose DiffDis to unify the cross-modal generative and discriminative pretraining into one single framework under the diffusion process.
We show that DiffDis outperforms single-task models on both the image generation and the image-text discriminative tasks.
arXiv Detail & Related papers (2023-08-18T05:03:48Z) - Towards General Visual-Linguistic Face Forgery Detection [95.73987327101143]
Deepfakes are realistic face manipulations that can pose serious threats to security, privacy, and trust.
Existing methods mostly treat this task as binary classification, which uses digital labels or mask signals to train the detection model.
We propose a novel paradigm named Visual-Linguistic Face Forgery Detection(VLFFD), which uses fine-grained sentence-level prompts as the annotation.
arXiv Detail & Related papers (2023-07-31T10:22:33Z) - Stable Diffusion is Unstable [21.13934830556678]
We propose Auto-attack on Text-to-image Models (ATM) to efficiently generate small perturbations.
ATM has achieved a 91.1% success rate in short-text attacks and an 81.2% success rate in long-text attacks.
Further empirical analysis revealed four attack patterns based on: 1) the variability in generation speed, 2) the similarity of coarse-grained characteristics, 3) the polysemy of words, and 4) the positioning of words.
arXiv Detail & Related papers (2023-06-05T04:21:43Z) - Real-World Image Variation by Aligning Diffusion Inversion Chain [53.772004619296794]
A domain gap exists between generated images and real-world images, which poses a challenge in generating high-quality variations of real-world images.
We propose a novel inference pipeline called Real-world Image Variation by ALignment (RIVAL)
Our pipeline enhances the generation quality of image variations by aligning the image generation process to the source image's inversion chain.
arXiv Detail & Related papers (2023-05-30T04:09:47Z) - Coherent Wave Dynamics and Language Generation of a Generative
Pre-trained Transformer [0.7832189413179361]
We analyze the hidden state and channel wave dynamics in a small Generative Pretrained Transformer (GPT)
Our findings suggest that wave dynamics offer consistent and repeatable intrinsic oscillation modes, along with context-aware plasticity and expressiveness in language generation.
In addition, we investigate the Poisson statistics of spelling errors in text sequence generation across various levels of model training.
arXiv Detail & Related papers (2023-05-08T21:35:12Z) - A Contrastive Framework for Neural Text Generation [46.845997620234265]
We show that an underlying reason for model degeneration is the anisotropic distribution of token representations.
We present a contrastive solution: (i) SimCTG, a contrastive training objective to calibrate the model's representation space, and (ii) a decoding method -- contrastive search -- to encourage diversity while maintaining coherence in the generated text.
arXiv Detail & Related papers (2022-02-13T21:46:14Z) - Exploring Transferable and Robust Adversarial Perturbation Generation
from the Perspective of Network Hierarchy [52.153866313879924]
The transferability and robustness of adversarial examples are two practical yet important properties for black-box adversarial attacks.
We propose a transferable and robust adversarial generation (TRAP) method.
Our TRAP achieves impressive transferability and high robustness against certain interferences.
arXiv Detail & Related papers (2021-08-16T11:52:41Z) - A Novel Estimator of Mutual Information for Learning to Disentangle
Textual Representations [27.129551973093008]
This paper introduces a novel variational upper bound to the mutual information between an attribute and the latent code of an encoder.
It aims at controlling the approximation error via the Renyi's divergence, leading to both better disentangled representations and a precise control of the desirable degree of disentanglement.
We show the superiority of this method on fair classification and on textual style transfer tasks.
arXiv Detail & Related papers (2021-05-06T14:05:06Z) - On Long-Tailed Phenomena in Neural Machine Translation [50.65273145888896]
State-of-the-art Neural Machine Translation (NMT) models struggle with generating low-frequency tokens.
We propose a new loss function, the Anti-Focal loss, to better adapt model training to the structural dependencies of conditional text generation.
We show the efficacy of the proposed technique on a number of Machine Translation (MT) datasets, demonstrating that it leads to significant gains over cross-entropy.
arXiv Detail & Related papers (2020-10-10T07:00:57Z) - Informed Sampling for Diversity in Concept-to-Text NLG [8.883733362171034]
We propose an Imitation Learning approach to explore the level of diversity that a language generation model can reliably produce.
Specifically, we augment the decoding process with a meta-classifier trained to distinguish which words at any given timestep will lead to high-quality output.
arXiv Detail & Related papers (2020-04-29T17:43:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.