Overview of CTC 2021: Chinese Text Correction for Native Speakers
- URL: http://arxiv.org/abs/2208.05681v1
- Date: Thu, 11 Aug 2022 07:58:48 GMT
- Title: Overview of CTC 2021: Chinese Text Correction for Native Speakers
- Authors: Honghong Zhao, Baoxin Wang, Dayong Wu, Wanxiang Che, Zhigang Chen,
Shijin Wang
- Abstract summary: We present an overview of the CTC 2021, a Chinese text correction task for native speakers.
We give detailed descriptions of the task definition and the data for training as well as evaluation.
We hope the data sets collected and annotated for this task can facilitate and expedite future development in this research area.
- Score: 46.98707360111395
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we present an overview of the CTC 2021, a Chinese text
correction task for native speakers. We give detailed descriptions of the task
definition and the data for training as well as evaluation. We also summarize
the approaches investigated by the participants of this task. We hope the data
sets collected and annotated for this task can facilitate and expedite future
development in this research area. Therefore, the pseudo training data, gold
standards validation data, and entire leaderboard is publicly available online
at https://destwang.github.io/CTC2021-explorer/.
Related papers
- Mavericks at NADI 2023 Shared Task: Unravelling Regional Nuances through
Dialect Identification using Transformer-based Approach [0.0]
We highlight our methodology for subtask 1 which deals with country-level dialect identification.
The task uses the Twitter dataset (TWT-2023) that encompasses 18 dialects for the multi-class classification problem.
We achieved an F1-score of 76.65 (11th rank on the leaderboard) on the test dataset.
arXiv Detail & Related papers (2023-11-30T17:37:56Z) - KIT's Multilingual Speech Translation System for IWSLT 2023 [58.5152569458259]
We describe our speech translation system for the multilingual track of IWSLT 2023.
The task requires translation into 10 languages of varying amounts of resources.
Our cascaded speech system substantially outperforms its end-to-end counterpart on scientific talk translation.
arXiv Detail & Related papers (2023-06-08T16:13:20Z) - Understanding Translationese in Cross-Lingual Summarization [106.69566000567598]
Cross-lingual summarization (MS) aims at generating a concise summary in a different target language.
To collect large-scale CLS data, existing datasets typically involve translation in their creation.
In this paper, we first confirm that different approaches of constructing CLS datasets will lead to different degrees of translationese.
arXiv Detail & Related papers (2022-12-14T13:41:49Z) - Speech-to-Speech Translation For A Real-world Unwritten Language [62.414304258701804]
We study speech-to-speech translation (S2ST) that translates speech from one language into another language.
We present an end-to-end solution from training data collection, modeling choices to benchmark dataset release.
arXiv Detail & Related papers (2022-11-11T20:21:38Z) - Handshakes AI Research at CASE 2021 Task 1: Exploring different
approaches for multilingual tasks [0.22940141855172036]
The aim of the CASE 2021 Shared Task 1 was to detect and classify socio-political and crisis event information in a multilingual setting.
Our submission contained entries in all of the subtasks, and the scores obtained validated our research finding.
arXiv Detail & Related papers (2021-10-29T07:58:49Z) - Accenture at CheckThat! 2021: Interesting claim identification and
ranking with contextually sensitive lexical training data augmentation [0.0]
This paper discusses the approach used by the Accenture Team for CLEF2021 CheckThat! Lab, Task 1.
It identifies whether a claim made in social media would be interesting to a wide audience and should be fact-checked.
Twitter training and test data were provided in English, Arabic, Spanish, Turkish, and Bulgarian.
arXiv Detail & Related papers (2021-07-12T18:46:47Z) - Generative Conversational Networks [67.13144697969501]
We propose a framework called Generative Conversational Networks, in which conversational agents learn to generate their own labelled training data.
We show an average improvement of 35% in intent detection and 21% in slot tagging over a baseline model trained from the seed data.
arXiv Detail & Related papers (2021-06-15T23:19:37Z) - WNUT-2020 Task 2: Identification of Informative COVID-19 English Tweets [21.41654078561586]
We describe how we construct a corpus of 10K Tweets and organize the development and evaluation phases for this task.
We present a brief summary of results obtained from the final system evaluation submissions of 55 teams.
arXiv Detail & Related papers (2020-10-16T08:28:05Z) - A Sentence Cloze Dataset for Chinese Machine Reading Comprehension [64.07894249743767]
We propose a new task called Sentence Cloze-style Machine Reading (SC-MRC)
The proposed task aims to fill the right candidate sentence into the passage that has several blanks.
We built a Chinese dataset called CMRC 2019 to evaluate the difficulty of the SC-MRC task.
arXiv Detail & Related papers (2020-04-07T04:09:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.