Coherence and Diversity through Noise: Self-Supervised Paraphrase
Generation via Structure-Aware Denoising
- URL: http://arxiv.org/abs/2302.02780v1
- Date: Mon, 6 Feb 2023 13:50:57 GMT
- Title: Coherence and Diversity through Noise: Self-Supervised Paraphrase
Generation via Structure-Aware Denoising
- Authors: Rishabh Gupta, Venktesh V., Mukesh Mohania, Vikram Goyal
- Abstract summary: We propose SCANING, an unsupervised framework for paraphrasing via controlled noise injection.
We focus on the novel task of paraphrasing algebraic word problems having practical applications in online pedagogy.
We demonstrate SCANING considerably improves performance in terms of both semantic preservation and producing diverse paraphrases.
- Score: 5.682665111938764
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose SCANING, an unsupervised framework for paraphrasing
via controlled noise injection. We focus on the novel task of paraphrasing
algebraic word problems having practical applications in online pedagogy as a
means to reduce plagiarism as well as ensure understanding on the part of the
student instead of rote memorization. This task is more complex than
paraphrasing general-domain corpora due to the difficulty in preserving
critical information for solution consistency of the paraphrased word problem,
managing the increased length of the text and ensuring diversity in the
generated paraphrase. Existing approaches fail to demonstrate adequate
performance on at least one, if not all, of these facets, necessitating the
need for a more comprehensive solution. To this end, we model the noising
search space as a composition of contextual and syntactic aspects and sample
noising functions consisting of either one or both aspects. This allows for
learning a denoising function that operates over both aspects and produces
semantically equivalent and syntactically diverse outputs through grounded
noise injection. The denoising function serves as a foundation for learning a
paraphrasing function which operates solely in the input-paraphrase space
without carrying any direct dependency on noise. We demonstrate SCANING
considerably improves performance in terms of both semantic preservation and
producing diverse paraphrases through extensive automated and manual evaluation
across 4 datasets.
Related papers
- DenoSent: A Denoising Objective for Self-Supervised Sentence
Representation Learning [59.4644086610381]
We propose a novel denoising objective that inherits from another perspective, i.e., the intra-sentence perspective.
By introducing both discrete and continuous noise, we generate noisy sentences and then train our model to restore them to their original form.
Our empirical evaluations demonstrate that this approach delivers competitive results on both semantic textual similarity (STS) and a wide range of transfer tasks.
arXiv Detail & Related papers (2024-01-24T17:48:45Z) - Learning Disentangled Speech Representations [0.412484724941528]
SynSpeech is a novel large-scale synthetic speech dataset designed to enable research on disentangled speech representations.
We present a framework to evaluate disentangled representation learning techniques, applying both linear probing and established supervised disentanglement metrics.
We find that SynSpeech facilitates benchmarking across a range of factors, achieving promising disentanglement of simpler features like gender and speaking style, while highlighting challenges in isolating complex attributes like speaker identity.
arXiv Detail & Related papers (2023-11-04T04:54:17Z) - Improving the Robustness of Summarization Systems with Dual Augmentation [68.53139002203118]
A robust summarization system should be able to capture the gist of the document, regardless of the specific word choices or noise in the input.
We first explore the summarization models' robustness against perturbations including word-level synonym substitution and noise.
We propose a SummAttacker, which is an efficient approach to generating adversarial samples based on language models.
arXiv Detail & Related papers (2023-06-01T19:04:17Z) - Learning Semantic Correspondence with Sparse Annotations [66.37298464505261]
Finding dense semantic correspondence is a fundamental problem in computer vision.
We propose a teacher-student learning paradigm for generating dense pseudo-labels.
We also develop two novel strategies for denoising pseudo-labels.
arXiv Detail & Related papers (2022-08-15T02:24:18Z) - Keywords and Instances: A Hierarchical Contrastive Learning Framework
Unifying Hybrid Granularities for Text Generation [59.01297461453444]
We propose a hierarchical contrastive learning mechanism, which can unify hybrid granularities semantic meaning in the input text.
Experiments demonstrate that our model outperforms competitive baselines on paraphrasing, dialogue generation, and storytelling tasks.
arXiv Detail & Related papers (2022-05-26T13:26:03Z) - Analysis of Joint Speech-Text Embeddings for Semantic Matching [3.6423306784901235]
We study a joint speech-text embedding space trained for semantic matching by minimizing the distance between paired utterance and transcription inputs.
We extend our method to incorporate automatic speech recognition through both pretraining and multitask scenarios.
arXiv Detail & Related papers (2022-04-04T04:50:32Z) - Co-Grounding Networks with Semantic Attention for Referring Expression
Comprehension in Videos [96.85840365678649]
We tackle the problem of referring expression comprehension in videos with an elegant one-stage framework.
We enhance the single-frame grounding accuracy by semantic attention learning and improve the cross-frame grounding consistency.
Our model is also applicable to referring expression comprehension in images, illustrated by the improved performance on the RefCOCO dataset.
arXiv Detail & Related papers (2021-03-23T06:42:49Z) - Pareto Probing: Trading Off Accuracy for Complexity [87.09294772742737]
We argue for a probe metric that reflects the fundamental trade-off between probe complexity and performance.
Our experiments with dependency parsing reveal a wide gap in syntactic knowledge between contextual and non-contextual representations.
arXiv Detail & Related papers (2020-10-05T17:27:31Z) - Noise Pollution in Hospital Readmission Prediction: Long Document
Classification with Reinforcement Learning [15.476161876559074]
This paper presents a reinforcement learning approach to extract noise in long clinical documents for the task of readmission prediction after kidney transplant.
We first experiment four types of encoders to empirically decide the best document representation, and then apply reinforcement learning to remove noisy text from the long documents.
arXiv Detail & Related papers (2020-05-04T04:06:53Z) - Adversarial Feature Learning and Unsupervised Clustering based Speech
Synthesis for Found Data with Acoustic and Textual Noise [18.135965605011105]
Attention-based sequence-to-sequence (seq2seq) speech synthesis has achieved extraordinary performance.
A studio-quality corpus with manual transcription is necessary to train such seq2seq systems.
We propose an approach to build high-quality and stable seq2seq based speech synthesis system using challenging found data.
arXiv Detail & Related papers (2020-04-28T15:32:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.