Diversity Over Size: On the Effect of Sample and Topic Sizes for Topic-Dependent Argument Mining Datasets
- URL: http://arxiv.org/abs/2205.11472v3
- Date: Mon, 07 Oct 2024 15:11:12 GMT
- Title: Diversity Over Size: On the Effect of Sample and Topic Sizes for Topic-Dependent Argument Mining Datasets
- Authors: Benjamin Schiller, Johannes Daxenberger, Andreas Waldis, Iryna Gurevych,
- Abstract summary: We investigate the effect of Argument Mining dataset composition in few- and zero-shot settings.
Our findings show that, while fine-tuning is mandatory to achieve acceptable model performance, using carefully composed training samples and reducing the training sample size by up to almost 90% can still yield 95% of the maximum performance.
- Score: 49.65208986436848
- License:
- Abstract: The task of Argument Mining, that is extracting and classifying argument components for a specific topic from large document sources, is an inherently difficult task for machine learning models and humans alike, as large Argument Mining datasets are rare and recognition of argument components requires expert knowledge. The task becomes even more difficult if it also involves stance detection of retrieved arguments. In this work, we investigate the effect of Argument Mining dataset composition in few- and zero-shot settings. Our findings show that, while fine-tuning is mandatory to achieve acceptable model performance, using carefully composed training samples and reducing the training sample size by up to almost 90% can still yield 95% of the maximum performance. This gain is consistent across three Argument Mining tasks on three different datasets. We also publish a new dataset for future benchmarking.
Related papers
- Multi-Task Learning Improves Performance In Deep Argument Mining Models [2.2312474084968024]
We show that different argument mining tasks share common semantic and logical structure by implementing a multi-task approach to argument mining.
Our results are important for argument mining as they show that different tasks share substantial similarities and suggest a holistic approach to the extraction of argumentative techniques from text.
arXiv Detail & Related papers (2023-07-03T23:42:29Z) - DiSparse: Disentangled Sparsification for Multitask Model Compression [92.84435347164435]
DiSparse is a simple, effective, and first-of-its-kind multitask pruning and sparse training scheme.
Our experimental results demonstrate superior performance on various configurations and settings.
arXiv Detail & Related papers (2022-06-09T17:57:46Z) - IAM: A Comprehensive and Large-Scale Dataset for Integrated Argument
Mining Tasks [59.457948080207174]
In this work, we introduce a comprehensive and large dataset named IAM, which can be applied to a series of argument mining tasks.
Near 70k sentences in the dataset are fully annotated based on their argument properties.
We propose two new integrated argument mining tasks associated with the debate preparation process: (1) claim extraction with stance classification (CESC) and (2) claim-evidence pair extraction (CEPE)
arXiv Detail & Related papers (2022-03-23T08:07:32Z) - Instance-Level Task Parameters: A Robust Multi-task Weighting Framework [17.639472693362926]
Recent works have shown that deep neural networks benefit from multi-task learning by learning a shared representation across several related tasks.
We let the training process dictate the optimal weighting of tasks for every instance in the dataset.
We conduct extensive experiments on SURREAL and CityScapes datasets, for human shape and pose estimation, depth estimation and semantic segmentation tasks.
arXiv Detail & Related papers (2021-06-11T02:35:42Z) - Aspect-Based Argument Mining [2.3148470932285665]
We present the task of Aspect-Based Argument Mining (ABAM) with the essential subtasks of Aspect Term Extraction (ATE) and Nested Term Extraction (NS)
We consider aspects as the main point(s) argument units are addressing.
This information is important for further downstream tasks such as argument ranking, argument summarization and generation, as well as the search for counter-arguments on the aspect-level.
arXiv Detail & Related papers (2020-11-01T21:57:51Z) - Multilingual Argument Mining: Datasets and Analysis [9.117984896907782]
We explore the potential of transfer learning using the multilingual BERT model to address argument mining tasks in non-English languages.
We show that such methods are well suited for classifying the stance of arguments and detecting evidence, but less so for assessing the quality of arguments.
We provide a human-generated dataset with more than 10k arguments in multiple languages, as well as machine translation of the English datasets.
arXiv Detail & Related papers (2020-10-13T14:49:10Z) - Learning to Match Jobs with Resumes from Sparse Interaction Data using
Multi-View Co-Teaching Network [83.64416937454801]
Job-resume interaction data is sparse and noisy, which affects the performance of job-resume match algorithms.
We propose a novel multi-view co-teaching network from sparse interaction data for job-resume matching.
Our model is able to outperform state-of-the-art methods for job-resume matching.
arXiv Detail & Related papers (2020-09-25T03:09:54Z) - Improving Multi-Turn Response Selection Models with Complementary
Last-Utterance Selection by Instance Weighting [84.9716460244444]
We consider utilizing the underlying correlation in the data resource itself to derive different kinds of supervision signals.
We conduct extensive experiments in two public datasets and obtain significant improvement in both datasets.
arXiv Detail & Related papers (2020-02-18T06:29:01Z) - Stance Detection Benchmark: How Robust Is Your Stance Detection? [65.91772010586605]
Stance Detection (StD) aims to detect an author's stance towards a certain topic or claim.
We introduce a StD benchmark that learns from ten StD datasets of various domains in a multi-dataset learning setting.
Within this benchmark setup, we are able to present new state-of-the-art results on five of the datasets.
arXiv Detail & Related papers (2020-01-06T13:37:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.