Automatic Text Evaluation through the Lens of Wasserstein Barycenters
- URL: http://arxiv.org/abs/2108.12463v1
- Date: Fri, 27 Aug 2021 19:08:52 GMT
- Title: Automatic Text Evaluation through the Lens of Wasserstein Barycenters
- Authors: Pierre Colombo, Guillaume Staerman, Chloe Clavel, Pablo Piantanida
- Abstract summary: A new metric textttBaryScore is introduced to evaluate text generation based on deep contextualized embeddings.
Our results show that textttBaryScore outperforms other BERT based metrics and exhibits more consistent behaviour in particular for text summarization.
- Score: 24.71226781348407
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A new metric \texttt{BaryScore} to evaluate text generation based on deep
contextualized embeddings (\textit{e.g.}, BERT, Roberta, ELMo) is introduced.
This metric is motivated by a new framework relying on optimal transport tools,
\textit{i.e.}, Wasserstein distance and barycenter. By modelling the layer
output of deep contextualized embeddings as a probability distribution rather
than by a vector embedding; this framework provides a natural way to aggregate
the different outputs through the Wasserstein space topology. In addition, it
provides theoretical grounds to our metric and offers an alternative to
available solutions (\textit{e.g.}, MoverScore and BertScore). Numerical
evaluation is performed on four different tasks: machine translation,
summarization, data2text generation and image captioning. Our results show that
\texttt{BaryScore} outperforms other BERT based metrics and exhibits more
consistent behaviour in particular for text summarization.
Related papers
- TIGERScore: Towards Building Explainable Metric for All Text Generation Tasks [44.801746603656504]
We present TIGERScore, a metric that follows textbfInstruction textbfGuidance to perform textbfExplainable and textbfReference-free evaluation.
Our metric is based on LLaMA-2, trained on our meticulously curated instruction-tuning dataset MetricInstruct.
arXiv Detail & Related papers (2023-10-01T18:01:51Z) - TOPFORMER: Topology-Aware Authorship Attribution of Deepfake Texts with Diverse Writing Styles [14.205559299967423]
Recent advances in Large Language Models (LLMs) have enabled the generation of open-ended high-quality texts, that are non-trivial to distinguish from human-written texts.
Users with malicious intent can easily use these open-sourced LLMs to generate harmful texts and dis/misinformation at scale.
To mitigate this problem, a computational method to determine if a given text is a deepfake text or not is desired.
We propose TopFormer to improve existing AA solutions by capturing more linguistic patterns in deepfake texts.
arXiv Detail & Related papers (2023-09-22T15:32:49Z) - Copy Is All You Need [66.00852205068327]
We formulate text generation as progressively copying text segments from an existing text collection.
Our approach achieves better generation quality according to both automatic and human evaluations.
Our approach attains additional performance gains by simply scaling up to larger text collections.
arXiv Detail & Related papers (2023-07-13T05:03:26Z) - TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision [61.186488081379]
We propose TextFormer, a query-based end-to-end text spotter with Transformer architecture.
TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multi-task modeling.
It allows for mutual training and optimization of classification, segmentation, and recognition branches, resulting in deeper feature sharing.
arXiv Detail & Related papers (2023-06-06T03:37:41Z) - Evaluating Factual Consistency of Texts with Semantic Role Labeling [3.1776833268555134]
We introduce SRLScore, a reference-free evaluation metric designed with text summarization in mind.
A final factuality score is computed by an adjustable scoring mechanism.
Correlation with human judgments on English summarization datasets shows that SRLScore is competitive with state-of-the-art methods.
arXiv Detail & Related papers (2023-05-22T17:59:42Z) - BARTScore: Evaluating Generated Text as Text Generation [89.50052670307434]
We conceptualize the evaluation of generated text as a text generation problem, modeled using pre-trained sequence-to-sequence models.
We operationalize this idea using BART, an encoder-decoder based pre-trained model.
We propose a metric BARTScore with a number of variants that can be flexibly applied to evaluation of text from different perspectives.
arXiv Detail & Related papers (2021-06-22T03:20:53Z) - BOTD: Bold Outline Text Detector [85.33700624095181]
We propose a new one-stage text detector, termed as Bold Outline Text Detector (BOTD)
BOTD is able to process the arbitrary-shaped text with low model complexity.
Experimental results on three real-world benchmarks show the state-of-the-art performance of BOTD.
arXiv Detail & Related papers (2020-11-30T11:54:14Z) - All you need is a second look: Towards Tighter Arbitrary shape text
detection [80.85188469964346]
Long curve text instances tend to be fragmented because of the limited receptive field size of CNN.
Simple representations using rectangle or quadrangle bounding boxes fall short when dealing with more challenging arbitrary-shaped texts.
textitNASK reconstructs text instances with a more tighter representation using the predicted geometrical attributes.
arXiv Detail & Related papers (2020-04-26T17:03:41Z) - Learning to Select Bi-Aspect Information for Document-Scale Text Content
Manipulation [50.01708049531156]
We focus on a new practical task, document-scale text content manipulation, which is the opposite of text style transfer.
In detail, the input is a set of structured records and a reference text for describing another recordset.
The output is a summary that accurately describes the partial content in the source recordset with the same writing style of the reference.
arXiv Detail & Related papers (2020-02-24T12:52:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.