'John ate 5 apples' != 'John ate some apples': Self-Supervised
Paraphrase Quality Detection for Algebraic Word Problems
- URL: http://arxiv.org/abs/2206.08263v1
- Date: Thu, 16 Jun 2022 16:01:59 GMT
- Title: 'John ate 5 apples' != 'John ate some apples': Self-Supervised
Paraphrase Quality Detection for Algebraic Word Problems
- Authors: Rishabh Gupta, Venktesh V, Mukesh Mohania, Vikram Goyal
- Abstract summary: This paper introduces the novel task of scoring paraphrases for Algebraic Word Problems (AWP)
We propose ParaQD, a self-supervised paraphrase quality detection method using novel data augmentations.
Our method outperforms existing state-of-the-art self-supervised methods by up to 32% while also demonstrating impressive zero-shot performance.
- Score: 5.682665111938764
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper introduces the novel task of scoring paraphrases for Algebraic
Word Problems (AWP) and presents a self-supervised method for doing so. In the
current online pedagogical setting, paraphrasing these problems is helpful for
academicians to generate multiple syntactically diverse questions for
assessments. It also helps induce variation to ensure that the student has
understood the problem instead of just memorizing it or using unfair means to
solve it. The current state-of-the-art paraphrase generation models often
cannot effectively paraphrase word problems, losing a critical piece of
information (such as numbers or units) which renders the question unsolvable.
There is a need for paraphrase scoring methods in the context of AWP to enable
the training of good paraphrasers. Thus, we propose ParaQD, a self-supervised
paraphrase quality detection method using novel data augmentations that can
learn latent representations to separate a high-quality paraphrase of an
algebraic question from a poor one by a wide margin. Through extensive
experimentation, we demonstrate that our method outperforms existing
state-of-the-art self-supervised methods by up to 32% while also demonstrating
impressive zero-shot performance.
Related papers
- Large Language Models as Analogical Reasoners [155.9617224350088]
Chain-of-thought (CoT) prompting for language models demonstrates impressive performance across reasoning tasks.
We introduce a new prompting approach, analogical prompting, designed to automatically guide the reasoning process of large language models.
arXiv Detail & Related papers (2023-10-03T00:57:26Z) - Self-Polish: Enhance Reasoning in Large Language Models via Problem Refinement [50.62461749446111]
Self-Polish (SP) is a novel method that facilitates the model's reasoning by guiding it to progressively refine the given problems to be more comprehensible and solvable.
SP is to all other prompting methods of answer/reasoning side like CoT, allowing for seamless integration with state-of-the-art techniques for further improvement.
arXiv Detail & Related papers (2023-05-23T19:58:30Z) - Coherence and Diversity through Noise: Self-Supervised Paraphrase
Generation via Structure-Aware Denoising [5.682665111938764]
We propose SCANING, an unsupervised framework for paraphrasing via controlled noise injection.
We focus on the novel task of paraphrasing algebraic word problems having practical applications in online pedagogy.
We demonstrate SCANING considerably improves performance in terms of both semantic preservation and producing diverse paraphrases.
arXiv Detail & Related papers (2023-02-06T13:50:57Z) - Automatic Generation of Socratic Subquestions for Teaching Math Word
Problems [16.97827669744673]
We explore the ability of large language models (LMs) in generating sequential questions for guiding math word problem-solving.
On both automatic and human quality evaluations, we find that LMs constrained with desirable question properties generate superior questions.
Results suggest that the difficulty level of problems plays an important role in determining whether questioning improves or hinders human performance.
arXiv Detail & Related papers (2022-11-23T10:40:22Z) - Learning to Selectively Learn for Weakly-supervised Paraphrase
Generation [81.65399115750054]
We propose a novel approach to generate high-quality paraphrases with weak supervision data.
Specifically, we tackle the weakly-supervised paraphrase generation problem by:.
obtaining abundant weakly-labeled parallel sentences via retrieval-based pseudo paraphrase expansion.
We demonstrate that our approach achieves significant improvements over existing unsupervised approaches, and is even comparable in performance with supervised state-of-the-arts.
arXiv Detail & Related papers (2021-09-25T23:31:13Z) - Phrase Retrieval Learns Passage Retrieval, Too [77.57208968326422]
We study whether phrase retrieval can serve as the basis for coarse-level retrieval including passages and documents.
We show that a dense phrase-retrieval system, without any retraining, already achieves better passage retrieval accuracy.
We also show that phrase filtering and vector quantization can reduce the size of our index by 4-10x.
arXiv Detail & Related papers (2021-09-16T17:42:45Z) - Deep learning for sentence clustering in essay grading support [1.7259867886009057]
We introduce two datasets of undergraduate student essays in Finnish, manually annotated for salient arguments on the sentence level.
We evaluate several deep-learning embedding methods for their suitability to sentence clustering in support of essay grading.
arXiv Detail & Related papers (2021-04-23T12:32:51Z) - Dealing with Missing Modalities in the Visual Question Answer-Difference
Prediction Task through Knowledge Distillation [75.1682163844354]
We address the issues of missing modalities that have arisen from the Visual Question Answer-Difference prediction task.
We introduce a model, the "Big" Teacher, that takes the image/question/answer triplet as its input and outperforms the baseline.
arXiv Detail & Related papers (2021-04-13T06:41:11Z) - My Teacher Thinks The World Is Flat! Interpreting Automatic Essay
Scoring Mechanism [71.34160809068996]
Recent work shows that automated scoring systems are prone to even common-sense adversarial samples.
We utilize recent advances in interpretability to find the extent to which features such as coherence, content and relevance are important for automated scoring mechanisms.
We also find that since the models are not semantically grounded with world-knowledge and common sense, adding false facts such as the world is flat'' actually increases the score instead of decreasing it.
arXiv Detail & Related papers (2020-12-27T06:19:20Z) - Automated Utterance Generation [5.220940151628735]
Using relevant utterances as features in question-answering has shown to improve both the precision and recall for retrieving the right answer by a conversational assistant.
We propose an utterance generation system which 1) uses extractive summarization to extract important sentences from the description, 2) uses multiple paraphrasing techniques to generate a diverse set of paraphrases of the title and summary sentences, and 3) selects good candidate paraphrases with the help of a novel candidate selection algorithm.
arXiv Detail & Related papers (2020-04-07T15:35:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.