Traditional Machine Learning and Deep Learning Models for Argumentation
Mining in Russian Texts
- URL: http://arxiv.org/abs/2106.14438v1
- Date: Mon, 28 Jun 2021 07:44:43 GMT
- Title: Traditional Machine Learning and Deep Learning Models for Argumentation
Mining in Russian Texts
- Authors: Irina Fishcheva, Valeriya Goloviznina, Evgeny Kotelnikov
- Abstract summary: A significant obstacle to research in this area for the Russian language is the lack of annotated Russian-language text corpora.
This article explores the possibility of improving the quality of argumentation mining using the extension of the Russian-language version of the Argumentative Micro Corpus (ArgMicro) based on the machine translation of the Persuasive Essays Corpus (PersEssays)
We solve the problem of classifying argumentative discourse units (ADUs) into two classes - "pro" ("for") and "opp" ("against") using traditional machine learning techniques (SVM, Bagging and XGBoost) and a deep neural
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Argumentation mining is a field of computational linguistics that is devoted
to extracting from texts and classifying arguments and relations between them,
as well as constructing an argumentative structure. A significant obstacle to
research in this area for the Russian language is the lack of annotated
Russian-language text corpora. This article explores the possibility of
improving the quality of argumentation mining using the extension of the
Russian-language version of the Argumentative Microtext Corpus (ArgMicro) based
on the machine translation of the Persuasive Essays Corpus (PersEssays). To
make it possible to use these two corpora combined, we propose a Joint Argument
Annotation Scheme based on the schemes used in ArgMicro and PersEssays. We
solve the problem of classifying argumentative discourse units (ADUs) into two
classes - "pro" ("for") and "opp" ("against") using traditional machine
learning techniques (SVM, Bagging and XGBoost) and a deep neural network (BERT
model). An ensemble of XGBoost and BERT models was proposed, which showed the
highest performance of ADUs classification for both corpora.
Related papers
- A Novel Cartography-Based Curriculum Learning Method Applied on RoNLI: The First Romanian Natural Language Inference Corpus [71.77214818319054]
Natural language inference is a proxy for natural language understanding.
There is no publicly available NLI corpus for the Romanian language.
We introduce the first Romanian NLI corpus (RoNLI) comprising 58K training sentence pairs.
arXiv Detail & Related papers (2024-05-20T08:41:15Z) - Automatic Summarization of Russian Texts: Comparison of Extractive and
Abstractive Methods [0.0]
Key problem of the argument text generation for the Russian language is the lack of annotated argumentation corpora.
In this paper, we use translated versions of the Argumentative Microtext, Persuasive Essays and UKP Sentential corpora to fine-tune RuBERT model.
The results show that this approach improves the accuracy of the argument generation by more than 20 percentage points compared to the original ruGPT-3 model.
arXiv Detail & Related papers (2022-06-18T17:28:04Z) - Argumentative Text Generation in Economic Domain [0.11470070927586015]
Key problem of the argument text generation for the Russian language is the lack of annotated argumentation corpora.
In this paper, we use translated versions of the Argumentative Microtext, Persuasive Essays and UKP Sentential corpora to fine-tune RuBERT model.
The results show that this approach improves the accuracy of the argument generation by more than 20 percentage points compared to the original ruGPT-3 model.
arXiv Detail & Related papers (2022-06-18T17:22:06Z) - RuArg-2022: Argument Mining Evaluation [69.87149207721035]
This paper is a report of the organizers on the first competition of argumentation analysis systems dealing with Russian language texts.
A corpus containing 9,550 sentences (comments on social media posts) on three topics related to the COVID-19 pandemic was prepared.
The system that won the first place in both tasks used the NLI (Natural Language Inference) variant of the BERT architecture.
arXiv Detail & Related papers (2022-06-18T17:13:37Z) - Can Unsupervised Knowledge Transfer from Social Discussions Help
Argument Mining? [25.43442712037725]
We propose a novel transfer learning strategy to overcome the challenges of unsupervised, argumentative discourse-aware knowledge.
We utilize argumentation-rich social discussions from the ChangeMyView subreddit as a source of unsupervised, argumentative discourse-aware knowledge.
We introduce a novel prompt-based strategy for inter-component relation prediction that compliments our proposed finetuning method.
arXiv Detail & Related papers (2022-03-24T06:48:56Z) - SG-Net: Syntax Guided Transformer for Language Representation [58.35672033887343]
We propose using syntax to guide the text modeling by incorporating explicit syntactic constraints into attention mechanisms for better linguistically motivated word representations.
In detail, for self-attention network (SAN) sponsored Transformer-based encoder, we introduce syntactic dependency of interest (SDOI) design into the SAN to form an SDOI-SAN with syntax-guided self-attention.
Experiments on popular benchmark tasks, including machine reading comprehension, natural language inference, and neural machine translation show the effectiveness of the proposed SG-Net design.
arXiv Detail & Related papers (2020-12-27T11:09:35Z) - Bridging the Modality Gap for Speech-to-Text Translation [57.47099674461832]
End-to-end speech translation aims to translate speech in one language into text in another language via an end-to-end way.
Most existing methods employ an encoder-decoder structure with a single encoder to learn acoustic representation and semantic information simultaneously.
We propose a Speech-to-Text Adaptation for Speech Translation model which aims to improve the end-to-end model performance by bridging the modality gap between speech and text.
arXiv Detail & Related papers (2020-10-28T12:33:04Z) - Exemplar-Controllable Paraphrasing and Translation using Bitext [57.92051459102902]
We adapt models from prior work to be able to learn solely from bilingual text (bitext)
Our single proposed model can perform four tasks: controlled paraphrase generation in both languages and controlled machine translation in both language directions.
arXiv Detail & Related papers (2020-10-12T17:02:50Z) - Automatically Ranked Russian Paraphrase Corpus for Text Generation [0.0]
The article is focused on automatic development and ranking of a large corpus for Russian paraphrase generation.
Existing manually annotated paraphrase datasets for Russian are limited to small-sized ParaPhraser corpus and ParaPlag.
arXiv Detail & Related papers (2020-06-17T08:40:52Z) - Commonsense Evidence Generation and Injection in Reading Comprehension [57.31927095547153]
We propose a Commonsense Evidence Generation and Injection framework in reading comprehension, named CEGI.
The framework injects two kinds of auxiliary commonsense evidence into comprehensive reading to equip the machine with the ability of rational thinking.
Experiments on the CosmosQA dataset demonstrate that the proposed CEGI model outperforms the current state-of-the-art approaches.
arXiv Detail & Related papers (2020-05-11T16:31:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.