Improving the Generalization Ability in Essay Coherence Evaluation
through Monotonic Constraints
- URL: http://arxiv.org/abs/2308.02506v1
- Date: Tue, 25 Jul 2023 08:26:46 GMT
- Title: Improving the Generalization Ability in Essay Coherence Evaluation
through Monotonic Constraints
- Authors: Chen Zheng, Huan Zhang, Yan Zhao, Yuxuan Lai
- Abstract summary: Coherence is a crucial aspect of evaluating text readability and can be assessed through two primary factors.
We propose a coherence scoring model consisting of a regression model with two feature extractors.
The model achieved third place in track 1 of NLPCC 2023 shared task 7.
- Score: 22.311428543432605
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Coherence is a crucial aspect of evaluating text readability and can be
assessed through two primary factors when evaluating an essay in a scoring
scenario. The first factor is logical coherence, characterized by the
appropriate use of discourse connectives and the establishment of logical
relationships between sentences. The second factor is the appropriateness of
punctuation, as inappropriate punctuation can lead to confused sentence
structure. To address these concerns, we propose a coherence scoring model
consisting of a regression model with two feature extractors: a local coherence
discriminative model and a punctuation correction model. We employ
gradient-boosting regression trees as the regression model and impose
monotonicity constraints on the input features. The results show that our
proposed model better generalizes unseen data. The model achieved third place
in track 1 of NLPCC 2023 shared task 7. Additionally, we briefly introduce our
solution for the remaining tracks, which achieves second place for track 2 and
first place for both track 3 and track 4.
Related papers
- Evaluating Generative Language Models in Information Extraction as Subjective Question Correction [49.729908337372436]
We propose a new evaluation method, SQC-Score.
Inspired by the principles in subjective question correction, we propose a new evaluation method, SQC-Score.
Results on three information extraction tasks show that SQC-Score is more preferred by human annotators than the baseline metrics.
arXiv Detail & Related papers (2024-04-04T15:36:53Z) - RDR: the Recap, Deliberate, and Respond Method for Enhanced Language
Understanding [6.738409533239947]
The Recap, Deliberate, and Respond (RDR) paradigm addresses this issue by incorporating three distinct objectives within the neural network pipeline.
By cascading these three models, we mitigate the potential for gaming the benchmark and establish a robust method for capturing the underlying semantic patterns.
Our results demonstrate improved performance compared to competitive baselines, with an enhancement of up to 2% on standard metrics.
arXiv Detail & Related papers (2023-12-15T16:41:48Z) - Towards Improving Faithfulness in Abstractive Summarization [37.19777407790153]
We propose a Faithfulness Enhanced Summarization model (FES) to improve fidelity in abstractive summarization.
Our model outperforms strong baselines in experiments on CNN/DM and XSum.
arXiv Detail & Related papers (2022-10-04T19:52:09Z) - SNaC: Coherence Error Detection for Narrative Summarization [73.48220043216087]
We introduce SNaC, a narrative coherence evaluation framework rooted in fine-grained annotations for long summaries.
We develop a taxonomy of coherence errors in generated narrative summaries and collect span-level annotations for 6.6k sentences across 150 book and movie screenplay summaries.
Our work provides the first characterization of coherence errors generated by state-of-the-art summarization models and a protocol for eliciting coherence judgments from crowd annotators.
arXiv Detail & Related papers (2022-05-19T16:01:47Z) - Distant finetuning with discourse relations for stance classification [55.131676584455306]
We propose a new method to extract data with silver labels from raw text to finetune a model for stance classification.
We also propose a 3-stage training framework where the noisy level in the data used for finetuning decreases over different stages.
Our approach ranks 1st among 26 competing teams in the stance classification track of the NLPCC 2021 shared task Argumentative Text Understanding for AI Debater.
arXiv Detail & Related papers (2022-04-27T04:24:35Z) - Bayesian Topic Regression for Causal Inference [3.9082355007261427]
Causal inference using observational text data is becoming increasingly popular in many research areas.
This paper presents the Bayesian Topic Regression model that uses both text and numerical information to model an outcome variable.
arXiv Detail & Related papers (2021-09-11T16:40:43Z) - Avoiding Inference Heuristics in Few-shot Prompt-based Finetuning [57.4036085386653]
We show that prompt-based models for sentence pair classification tasks still suffer from a common pitfall of adopting inferences based on lexical overlap.
We then show that adding a regularization that preserves pretraining weights is effective in mitigating this destructive tendency of few-shot finetuning.
arXiv Detail & Related papers (2021-09-09T10:10:29Z) - Realistic Evaluation Principles for Cross-document Coreference
Resolution [19.95214898312209]
We argue that models should not exploit the synthetic topic structure of the standard ECB+ dataset.
We demonstrate empirically the drastic impact of our more realistic evaluation principles on a competitive model.
arXiv Detail & Related papers (2021-06-08T09:05:21Z) - Understanding Neural Abstractive Summarization Models via Uncertainty [54.37665950633147]
seq2seq abstractive summarization models generate text in a free-form manner.
We study the entropy, or uncertainty, of the model's token-level predictions.
We show that uncertainty is a useful perspective for analyzing summarization and text generation models more broadly.
arXiv Detail & Related papers (2020-10-15T16:57:27Z) - Evaluating Text Coherence at Sentence and Paragraph Levels [17.99797111176988]
We investigate the adaptation of existing sentence ordering methods to a paragraph ordering task.
We also compare the learnability and robustness of existing models by artificially creating mini datasets and noisy datasets.
We conclude that the recurrent graph neural network-based model is an optimal choice for coherence modeling.
arXiv Detail & Related papers (2020-06-05T03:31:49Z) - Pseudo-Convolutional Policy Gradient for Sequence-to-Sequence
Lip-Reading [96.48553941812366]
Lip-reading aims to infer the speech content from the lip movement sequence.
Traditional learning process of seq2seq models suffers from two problems.
We propose a novel pseudo-convolutional policy gradient (PCPG) based method to address these two problems.
arXiv Detail & Related papers (2020-03-09T09:12:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.