Diversity Over Size: On the Effect of Sample and Topic Sizes for
Argument Mining Datasets
- URL: http://arxiv.org/abs/2205.11472v2
- Date: Sat, 15 Jul 2023 14:39:15 GMT
- Title: Diversity Over Size: On the Effect of Sample and Topic Sizes for
Argument Mining Datasets
- Authors: Benjamin Schiller, Johannes Daxenberger, Iryna Gurevych
- Abstract summary: Large Argument Mining datasets are rare and recognition of argumentative sentences requires expert knowledge.
Given the cost and complexity of creating large Argument Mining datasets, we ask whether it is necessary for acceptable performance to have datasets growing in size.
Our findings show that, when using carefully composed training samples and a model pretrained on related tasks, we can reach 95% of the maximum performance while reducing the training sample size by at least 85%.
- Score: 65.91772010586605
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The task of Argument Mining, that is extracting argumentative sentences for a
specific topic from large document sources, is an inherently difficult task for
machine learning models and humans alike, as large Argument Mining datasets are
rare and recognition of argumentative sentences requires expert knowledge. The
task becomes even more difficult if it also involves stance detection of
retrieved arguments. Given the cost and complexity of creating suitably large
Argument Mining datasets, we ask whether it is necessary for acceptable
performance to have datasets growing in size. Our findings show that, when
using carefully composed training samples and a model pretrained on related
tasks, we can reach 95% of the maximum performance while reducing the training
sample size by at least 85%. This gain is consistent across three Argument
Mining tasks on three different datasets. We also publish a new dataset for
future benchmarking.
Related papers
- The Power of Active Multi-Task Learning in Reinforcement Learning from Human Feedback [12.388205905012423]
Reinforcement learning from human feedback has contributed to performance improvements in large language models.
We formulate RLHF as the contextual dueling bandit problem and assume a common linear representation.
We prove that to achieve $varepsilon-$optimal, the sample complexity of the source tasks can be significantly reduced.
arXiv Detail & Related papers (2024-05-18T08:29:15Z) - Retrieval-Augmented Data Augmentation for Low-Resource Domain Tasks [66.87070857705994]
In low-resource settings, the amount of seed data samples to use for data augmentation is very small.
We propose a novel method that augments training data by incorporating a wealth of examples from other datasets.
This approach can ensure that the generated data is not only relevant but also more diverse than what could be achieved using the limited seed data alone.
arXiv Detail & Related papers (2024-02-21T02:45:46Z) - Multi-Task Learning Improves Performance In Deep Argument Mining Models [2.2312474084968024]
We show that different argument mining tasks share common semantic and logical structure by implementing a multi-task approach to argument mining.
Our results are important for argument mining as they show that different tasks share substantial similarities and suggest a holistic approach to the extraction of argumentative techniques from text.
arXiv Detail & Related papers (2023-07-03T23:42:29Z) - USB: A Unified Summarization Benchmark Across Tasks and Domains [68.82726887802856]
We introduce a Wikipedia-derived benchmark, complemented by a rich set of crowd-sourced annotations, that supports $8$ interrelated tasks.
We compare various methods on this benchmark and discover that on multiple tasks, moderately-sized fine-tuned models consistently outperform much larger few-shot prompted language models.
arXiv Detail & Related papers (2023-05-23T17:39:54Z) - Learning towards Selective Data Augmentation for Dialogue Generation [52.540330534137794]
We argue that not all cases are beneficial for augmentation task, and the cases suitable for augmentation should obey the following two attributes.
We propose a Selective Data Augmentation framework (SDA) for the response generation task.
arXiv Detail & Related papers (2023-03-17T01:26:39Z) - IAM: A Comprehensive and Large-Scale Dataset for Integrated Argument
Mining Tasks [59.457948080207174]
In this work, we introduce a comprehensive and large dataset named IAM, which can be applied to a series of argument mining tasks.
Near 70k sentences in the dataset are fully annotated based on their argument properties.
We propose two new integrated argument mining tasks associated with the debate preparation process: (1) claim extraction with stance classification (CESC) and (2) claim-evidence pair extraction (CEPE)
arXiv Detail & Related papers (2022-03-23T08:07:32Z) - Multilingual Argument Mining: Datasets and Analysis [9.117984896907782]
We explore the potential of transfer learning using the multilingual BERT model to address argument mining tasks in non-English languages.
We show that such methods are well suited for classifying the stance of arguments and detecting evidence, but less so for assessing the quality of arguments.
We provide a human-generated dataset with more than 10k arguments in multiple languages, as well as machine translation of the English datasets.
arXiv Detail & Related papers (2020-10-13T14:49:10Z) - From Arguments to Key Points: Towards Automatic Argument Summarization [17.875273745811775]
We show that a small number of key points per topic is typically sufficient for covering the vast majority of the arguments.
Furthermore, we found that a domain expert can often predict these key points in advance.
arXiv Detail & Related papers (2020-05-04T16:24:21Z) - Improving Multi-Turn Response Selection Models with Complementary
Last-Utterance Selection by Instance Weighting [84.9716460244444]
We consider utilizing the underlying correlation in the data resource itself to derive different kinds of supervision signals.
We conduct extensive experiments in two public datasets and obtain significant improvement in both datasets.
arXiv Detail & Related papers (2020-02-18T06:29:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.