Related papers: Automatic Text Evaluation through the Lens of Wasserstein Barycenters

Automatic Text Evaluation through the Lens of Wasserstein Barycenters

URL: http://arxiv.org/abs/2108.12463v1
Date: Fri, 27 Aug 2021 19:08:52 GMT
Title: Automatic Text Evaluation through the Lens of Wasserstein Barycenters
Authors: Pierre Colombo, Guillaume Staerman, Chloe Clavel, Pablo Piantanida
Abstract summary: A new metric textttBaryScore is introduced to evaluate text generation based on deep contextualized embeddings. Our results show that textttBaryScore outperforms other BERT based metrics and exhibits more consistent behaviour in particular for text summarization.
Score: 24.71226781348407
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A new metric \texttt{BaryScore} to evaluate text generation based on deep contextualized embeddings (\textit{e.g.}, BERT, Roberta, ELMo) is introduced. This metric is motivated by a new framework relying on optimal transport tools, \textit{i.e.}, Wasserstein distance and barycenter. By modelling the layer output of deep contextualized embeddings as a probability distribution rather than by a vector embedding; this framework provides a natural way to aggregate the different outputs through the Wasserstein space topology. In addition, it provides theoretical grounds to our metric and offers an alternative to available solutions (\textit{e.g.}, MoverScore and BertScore). Numerical evaluation is performed on four different tasks: machine translation, summarization, data2text generation and image captioning. Our results show that \texttt{BaryScore} outperforms other BERT based metrics and exhibits more consistent behaviour in particular for text summarization.

Related papers

StructText: A Synthetic Table-to-Text Approach for Benchmark Generation with Multi-Dimensional Evaluation [8.251302684712773]
StructText is an end-to-end framework for automatically generating high-fidelity benchmarks for key-value extraction from text.<n>We evaluate the proposed method on 71,539 examples across 49 documents.
arXiv Detail & Related papers (2025-07-28T21:20:44Z)
Detecting Document-level Paraphrased Machine Generated Content: Mimicking Human Writing Style and Involving Discourse Features [57.34477506004105]
Machine-generated content poses challenges such as academic plagiarism and the spread of misinformation.<n>We introduce novel methodologies and datasets to overcome these challenges.<n>We propose MhBART, an encoder-decoder model designed to emulate human writing style.<n>We also propose DTransformer, a model that integrates discourse analysis through PDTB preprocessing to encode structural features.
arXiv Detail & Related papers (2024-12-17T08:47:41Z)
Statistical Depth for Ranking and Characterizing Transformer-Based Text Embeddings [1.321681963474017]
A statistical depth is a function for ranking k-dimensional objects by measuring centrality with respect to some observed k-dimensional distribution. We adopt a statistical depth to measure distributions of transformer-based text embeddings, transformer-based text embedding (TTE) depth, and introduce the practical use of this depth for both modeling and distributional inference in NLP pipelines.
arXiv Detail & Related papers (2023-10-23T15:02:44Z)
TIGERScore: Towards Building Explainable Metric for All Text Generation Tasks [44.801746603656504]
We present TIGERScore, a metric that follows textbfInstruction textbfGuidance to perform textbfExplainable and textbfReference-free evaluation. Our metric is based on LLaMA-2, trained on our meticulously curated instruction-tuning dataset MetricInstruct.
arXiv Detail & Related papers (2023-10-01T18:01:51Z)
TOPFORMER: Topology-Aware Authorship Attribution of Deepfake Texts with Diverse Writing Styles [14.205559299967423]
Recent advances in Large Language Models (LLMs) have enabled the generation of open-ended high-quality texts, that are non-trivial to distinguish from human-written texts. Users with malicious intent can easily use these open-sourced LLMs to generate harmful texts and dis/misinformation at scale. To mitigate this problem, a computational method to determine if a given text is a deepfake text or not is desired. We propose TopFormer to improve existing AA solutions by capturing more linguistic patterns in deepfake texts.
arXiv Detail & Related papers (2023-09-22T15:32:49Z)
Copy Is All You Need [66.00852205068327]
We formulate text generation as progressively copying text segments from an existing text collection. Our approach achieves better generation quality according to both automatic and human evaluations. Our approach attains additional performance gains by simply scaling up to larger text collections.
arXiv Detail & Related papers (2023-07-13T05:03:26Z)
TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision [61.186488081379]
We propose TextFormer, a query-based end-to-end text spotter with Transformer architecture. TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multi-task modeling. It allows for mutual training and optimization of classification, segmentation, and recognition branches, resulting in deeper feature sharing.
arXiv Detail & Related papers (2023-06-06T03:37:41Z)
Evaluating Factual Consistency of Texts with Semantic Role Labeling [3.1776833268555134]
We introduce SRLScore, a reference-free evaluation metric designed with text summarization in mind. A final factuality score is computed by an adjustable scoring mechanism. Correlation with human judgments on English summarization datasets shows that SRLScore is competitive with state-of-the-art methods.
arXiv Detail & Related papers (2023-05-22T17:59:42Z)
SMART: Sentences as Basic Units for Text Evaluation [48.5999587529085]
In this paper, we introduce a new metric called SMART to mitigate such limitations. We treat sentences as basic units of matching instead of tokens, and use a sentence matching function to soft-match candidate and reference sentences. Our results show that system-level correlations of our proposed metric with a model-based matching function outperforms all competing metrics.
arXiv Detail & Related papers (2022-08-01T17:58:05Z)
BARTScore: Evaluating Generated Text as Text Generation [89.50052670307434]
We conceptualize the evaluation of generated text as a text generation problem, modeled using pre-trained sequence-to-sequence models. We operationalize this idea using BART, an encoder-decoder based pre-trained model. We propose a metric BARTScore with a number of variants that can be flexibly applied to evaluation of text from different perspectives.
arXiv Detail & Related papers (2021-06-22T03:20:53Z)
BOTD: Bold Outline Text Detector [85.33700624095181]
We propose a new one-stage text detector, termed as Bold Outline Text Detector (BOTD) BOTD is able to process the arbitrary-shaped text with low model complexity. Experimental results on three real-world benchmarks show the state-of-the-art performance of BOTD.
arXiv Detail & Related papers (2020-11-30T11:54:14Z)
Graph-based Topic Extraction from Vector Embeddings of Text Documents: Application to a Corpus of News Articles [0.0]
We present an unsupervised framework that brings together powerful vector embeddings from natural language processing with tools from multiscale graph partitioning. We show the advantages of graph-based clustering through end-to-end comparisons with other popular clustering and topic modelling methods. This work is showcased through an analysis of a corpus of US news coverage during the presidential election year of 2016.
arXiv Detail & Related papers (2020-10-28T16:20:05Z)
All you need is a second look: Towards Tighter Arbitrary shape text detection [80.85188469964346]
Long curve text instances tend to be fragmented because of the limited receptive field size of CNN. Simple representations using rectangle or quadrangle bounding boxes fall short when dealing with more challenging arbitrary-shaped texts. textitNASK reconstructs text instances with a more tighter representation using the predicted geometrical attributes.
arXiv Detail & Related papers (2020-04-26T17:03:41Z)
Learning to Select Bi-Aspect Information for Document-Scale Text Content Manipulation [50.01708049531156]
We focus on a new practical task, document-scale text content manipulation, which is the opposite of text style transfer. In detail, the input is a set of structured records and a reference text for describing another recordset. The output is a summary that accurately describes the partial content in the source recordset with the same writing style of the reference.
arXiv Detail & Related papers (2020-02-24T12:52:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.