Automatic Discourse Segmentation: an evaluation in French
- URL: http://arxiv.org/abs/2002.04095v2
- Date: Thu, 11 Jun 2020 20:27:29 GMT
- Title: Automatic Discourse Segmentation: an evaluation in French
- Authors: R\'emy Saksik, Alejandro Molina-Villegas, Andr\'ea Carneiro Linhares,
Juan-Manuel Torres-Moreno
- Abstract summary: We describe some discursive segmentation methods as well as a preliminary evaluation of the segmentation quality.
We have developed three models solely based on resources simultaneously available in several languages: marker lists and a statistic POS labeling.
- Score: 65.00134288222509
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this article, we describe some discursive segmentation methods as well as
a preliminary evaluation of the segmentation quality. Although our experiment
were carried for documents in French, we have developed three discursive
segmentation models solely based on resources simultaneously available in
several languages: marker lists and a statistic POS labeling. We have also
carried out automatic evaluations of these systems against the Annodis corpus,
which is a manually annotated reference. The results obtained are very
encouraging.
Related papers
- Segmentation en phrases : ouvrez les guillemets sans perdre le fil [0.08192907805418582]
This paper presents a graph cascade for sentence segmentation of XML documents.
Our proposal offers sentences inside sentences for cases introduced by quotation marks and hyphens, and also pays particular attention to situations involving incises introduced by parentheses and lists introduced by colons.
arXiv Detail & Related papers (2024-07-29T09:02:38Z) - Evaluating the IWSLT2023 Speech Translation Tasks: Human Annotations, Automatic Metrics, and Segmentation [50.60733773088296]
We conduct a comprehensive human evaluation of the results of several shared tasks from the last International Workshop on Spoken Language Translation (IWSLT 2023)
We propose an effective evaluation strategy based on automatic resegmentation and direct assessment with segment context.
Our analysis revealed that: 1) the proposed evaluation strategy is robust and scores well-correlated with other types of human judgements; 2) automatic metrics are usually, but not always, well-correlated with direct assessment scores; and 3) COMET as a slightly stronger automatic metric than chrF.
arXiv Detail & Related papers (2024-06-06T09:18:42Z) - SemScore: Automated Evaluation of Instruction-Tuned LLMs based on
Semantic Textual Similarity [3.3162484539136416]
We propose a simple but remarkably effective evaluation metric called SemScore.
We compare model outputs to gold target responses using semantic textual similarity (STS)
We find that our proposed SemScore metric outperforms all other, in many cases more complex, evaluation metrics in terms of correlation to human evaluation.
arXiv Detail & Related papers (2024-01-30T14:52:50Z) - Rethinking Evaluation Metrics of Open-Vocabulary Segmentaion [78.76867266561537]
The evaluation process still heavily relies on closed-set metrics without considering the similarity between predicted and ground truth categories.
To tackle this issue, we first survey eleven similarity measurements between two categorical words.
We designed novel evaluation metrics, namely Open mIoU, Open AP, and Open PQ, tailored for three open-vocabulary segmentation tasks.
arXiv Detail & Related papers (2023-11-06T18:59:01Z) - Using Natural Language Explanations to Rescale Human Judgments [81.66697572357477]
We propose a method to rescale ordinal annotations and explanations using large language models (LLMs)
We feed annotators' Likert ratings and corresponding explanations into an LLM and prompt it to produce a numeric score anchored in a scoring rubric.
Our method rescales the raw judgments without impacting agreement and brings the scores closer to human judgments grounded in the same scoring rubric.
arXiv Detail & Related papers (2023-05-24T06:19:14Z) - FRMT: A Benchmark for Few-Shot Region-Aware Machine Translation [64.9546787488337]
We present FRMT, a new dataset and evaluation benchmark for Few-shot Region-aware Machine Translation.
The dataset consists of professional translations from English into two regional variants each of Portuguese and Mandarin Chinese.
arXiv Detail & Related papers (2022-10-01T05:02:04Z) - Evaluating the Efficacy of Summarization Evaluation across Languages [33.46519116869276]
We take a summarization corpus for eight different languages, and manually annotate generated summaries for focus (precision) and coverage (recall)
We find that using multilingual BERT within BERTScore performs well across all languages, at a level above that for English.
arXiv Detail & Related papers (2021-06-02T21:28:01Z) - Curious Case of Language Generation Evaluation Metrics: A Cautionary
Tale [52.663117551150954]
A few popular metrics remain as the de facto metrics to evaluate tasks such as image captioning and machine translation.
This is partly due to ease of use, and partly because researchers expect to see them and know how to interpret them.
In this paper, we urge the community for more careful consideration of how they automatically evaluate their models.
arXiv Detail & Related papers (2020-10-26T13:57:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.