Evaluating Text Coherence at Sentence and Paragraph Levels
- URL: http://arxiv.org/abs/2006.03221v1
- Date: Fri, 5 Jun 2020 03:31:49 GMT
- Title: Evaluating Text Coherence at Sentence and Paragraph Levels
- Authors: Sennan Liu, Shuang Zeng and Sujian Li
- Abstract summary: We investigate the adaptation of existing sentence ordering methods to a paragraph ordering task.
We also compare the learnability and robustness of existing models by artificially creating mini datasets and noisy datasets.
We conclude that the recurrent graph neural network-based model is an optimal choice for coherence modeling.
- Score: 17.99797111176988
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, to evaluate text coherence, we propose the paragraph ordering
task as well as conducting sentence ordering. We collected four distinct
corpora from different domains on which we investigate the adaptation of
existing sentence ordering methods to a paragraph ordering task. We also
compare the learnability and robustness of existing models by artificially
creating mini datasets and noisy datasets respectively and verifying the
efficiency of established models under these circumstances. Furthermore, we
carry out human evaluation on the rearranged passages from two competitive
models and confirm that WLCS-l is a better metric performing significantly
higher correlations with human rating than tau, the most prevalent metric used
before. Results from these evaluations show that except for certain extreme
conditions, the recurrent graph neural network-based model is an optimal choice
for coherence modeling.
Related papers
- Fine-Tuning with Divergent Chains of Thought Boosts Reasoning Through Self-Correction in Language Models [63.36637269634553]
We present a novel method of further improving performance by requiring models to compare multiple reasoning chains.
We find that instruction tuning on DCoT datasets boosts the performance of even smaller, and therefore more accessible, language models.
arXiv Detail & Related papers (2024-07-03T15:01:18Z) - A LLM-Based Ranking Method for the Evaluation of Automatic Counter-Narrative Generation [14.064465097974836]
This paper proposes a novel approach to evaluate Counter Narrative (CN) generation using a Large Language Model (LLM) as an evaluator.
We show that traditional automatic metrics correlate poorly with human judgements and fail to capture the nuanced relationship between generated CNs and human perception.
arXiv Detail & Related papers (2024-06-21T15:11:33Z) - Confidence-aware Reward Optimization for Fine-tuning Text-to-Image Models [85.96013373385057]
Fine-tuning text-to-image models with reward functions trained on human feedback data has proven effective for aligning model behavior with human intent.
However, excessive optimization with such reward models, which serve as mere proxy objectives, can compromise the performance of fine-tuned models.
We propose TextNorm, a method that enhances alignment based on a measure of reward model confidence estimated across a set of semantically contrastive text prompts.
arXiv Detail & Related papers (2024-04-02T11:40:38Z) - Calibrating Likelihoods towards Consistency in Summarization Models [22.023863165579602]
We argue that the main reason for such behavior is that the summarization models trained with maximum likelihood objective assign high probability to plausible sequences given the context.
In this work, we solve this problem by calibrating the likelihood of model generated sequences to better align with a consistency metric measured by natural language inference (NLI) models.
arXiv Detail & Related papers (2023-10-12T23:17:56Z) - Revisiting the Evaluation of Image Synthesis with GANs [55.72247435112475]
This study presents an empirical investigation into the evaluation of synthesis performance, with generative adversarial networks (GANs) as a representative of generative models.
In particular, we make in-depth analyses of various factors, including how to represent a data point in the representation space, how to calculate a fair distance using selected samples, and how many instances to use from each set.
arXiv Detail & Related papers (2023-04-04T17:54:32Z) - Evaluating Representations with Readout Model Switching [19.907607374144167]
In this paper, we propose to use the Minimum Description Length (MDL) principle to devise an evaluation metric.
We design a hybrid discrete and continuous-valued model space for the readout models and employ a switching strategy to combine their predictions.
The proposed metric can be efficiently computed with an online method and we present results for pre-trained vision encoders of various architectures.
arXiv Detail & Related papers (2023-02-19T14:08:01Z) - Few-shot Text Classification with Dual Contrastive Consistency [31.141350717029358]
In this paper, we explore how to utilize pre-trained language model to perform few-shot text classification.
We adopt supervised contrastive learning on few labeled data and consistency-regularization on vast unlabeled data.
arXiv Detail & Related papers (2022-09-29T19:26:23Z) - Long Document Summarization with Top-down and Bottom-up Inference [113.29319668246407]
We propose a principled inference framework to improve summarization models on two aspects.
Our framework assumes a hierarchical latent structure of a document where the top-level captures the long range dependency.
We demonstrate the effectiveness of the proposed framework on a diverse set of summarization datasets.
arXiv Detail & Related papers (2022-03-15T01:24:51Z) - SummEval: Re-evaluating Summarization Evaluation [169.622515287256]
We re-evaluate 14 automatic evaluation metrics in a comprehensive and consistent fashion.
We benchmark 23 recent summarization models using the aforementioned automatic evaluation metrics.
We assemble the largest collection of summaries generated by models trained on the CNN/DailyMail news dataset.
arXiv Detail & Related papers (2020-07-24T16:25:19Z) - Document Ranking with a Pretrained Sequence-to-Sequence Model [56.44269917346376]
We show how a sequence-to-sequence model can be trained to generate relevance labels as "target words"
Our approach significantly outperforms an encoder-only model in a data-poor regime.
arXiv Detail & Related papers (2020-03-14T22:29:50Z) - Preference Modeling with Context-Dependent Salient Features [12.403492796441434]
We consider the problem of estimating a ranking on a set of items from noisy pairwise comparisons given item features.
Our key observation is that two items compared in isolation from other items may be compared based on only a salient subset of features.
arXiv Detail & Related papers (2020-02-22T04:05:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.