RaFoLa: A Rationale-Annotated Corpus for Detecting Indicators of Forced
Labour
- URL: http://arxiv.org/abs/2205.02684v1
- Date: Thu, 5 May 2022 14:43:31 GMT
- Title: RaFoLa: A Rationale-Annotated Corpus for Detecting Indicators of Forced
Labour
- Authors: Erick Mendez Guzman, Viktor Schlegel and Riza Batista-Navarro
- Abstract summary: This paper presents the first openly accessible English corpus annotated for multi-class and multi-label forced labour detection.
The corpus consists of 989 news articles retrieved from specialised data sources and annotated according to risk indicators defined by the International Labour Organization (ILO)
- Score: 4.393754160527062
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Forced labour is the most common type of modern slavery, and it is
increasingly gaining the attention of the research and social community. Recent
studies suggest that artificial intelligence (AI) holds immense potential for
augmenting anti-slavery action. However, AI tools need to be developed
transparently in cooperation with different stakeholders. Such tools are
contingent on the availability and access to domain-specific data, which are
scarce due to the near-invisible nature of forced labour. To the best of our
knowledge, this paper presents the first openly accessible English corpus
annotated for multi-class and multi-label forced labour detection. The corpus
consists of 989 news articles retrieved from specialised data sources and
annotated according to risk indicators defined by the International Labour
Organization (ILO). Each news article was annotated for two aspects: (1)
indicators of forced labour as classification labels and (2) snippets of the
text that justify labelling decisions. We hope that our data set can help
promote research on explainability for multi-class and multi-label text
classification. In this work, we explain our process for collecting the data
underpinning the proposed corpus, describe our annotation guidelines and
present some statistical analysis of its content. Finally, we summarise the
results of baseline experiments based on different variants of the
Bidirectional Encoder Representation from Transformer (BERT) model.
Related papers
- Unsupervised Sentiment Analysis of Plastic Surgery Social Media Posts [91.3755431537592]
The massive collection of user posts across social media platforms is primarily untapped for artificial intelligence (AI) use cases.
Natural language processing (NLP) is a subfield of AI that leverages bodies of documents, known as corpora, to train computers in human-like language understanding.
This study demonstrates that the applied results of unsupervised analysis allow a computer to predict either negative, positive, or neutral user sentiment towards plastic surgery.
arXiv Detail & Related papers (2023-07-05T20:16:20Z) - Automatic Detection of Industry Sectors in Legal Articles Using Machine
Learning Approaches [0.0]
A dataset consisting of over 1,700 annotated legal articles was created for the identification of six industry sectors.
The system achieved promising results with area under the receiver operating characteristic curve scores above 0.90 and F-scores above 0.81 with respect to the six industry sectors.
arXiv Detail & Related papers (2023-03-08T12:41:56Z) - PropSegmEnt: A Large-Scale Corpus for Proposition-Level Segmentation and
Entailment Recognition [63.51569687229681]
We argue for the need to recognize the textual entailment relation of each proposition in a sentence individually.
We propose PropSegmEnt, a corpus of over 45K propositions annotated by expert human raters.
Our dataset structure resembles the tasks of (1) segmenting sentences within a document to the set of propositions, and (2) classifying the entailment relation of each proposition with respect to a different yet topically-aligned document.
arXiv Detail & Related papers (2022-12-21T04:03:33Z) - CitationIE: Leveraging the Citation Graph for Scientific Information
Extraction [89.33938657493765]
We use the citation graph of referential links between citing and cited papers.
We observe a sizable improvement in end-to-end information extraction over the state-of-the-art.
arXiv Detail & Related papers (2021-06-03T03:00:12Z) - Generating Diversified Comments via Reader-Aware Topic Modeling and
Saliency Detection [25.16392119801612]
We propose a reader-aware topic modeling and saliency information detection framework to enhance the quality of generated comments.
For reader-aware topic modeling, we design a variational generative clustering algorithm for latent semantic learning and topic mining from reader comments.
For saliency information detection, we introduce Bernoulli distribution estimating on news content to select saliency information.
arXiv Detail & Related papers (2021-02-13T03:50:31Z) - What's New? Summarizing Contributions in Scientific Literature [85.95906677964815]
We introduce a new task of disentangled paper summarization, which seeks to generate separate summaries for the paper contributions and the context of the work.
We extend the S2ORC corpus of academic articles by adding disentangled "contribution" and "context" reference labels.
We propose a comprehensive automatic evaluation protocol which reports the relevance, novelty, and disentanglement of generated outputs.
arXiv Detail & Related papers (2020-11-06T02:23:01Z) - Topic-Centric Unsupervised Multi-Document Summarization of Scientific
and News Articles [3.0504782036247438]
We propose a topic-centric unsupervised multi-document summarization framework to generate abstractive summaries.
The proposed algorithm generates an abstractive summary by developing salient language unit selection and text generation techniques.
Our approach matches the state-of-the-art when evaluated on automated extractive evaluation metrics and performs better for abstractive summarization on five human evaluation metrics.
arXiv Detail & Related papers (2020-11-03T04:04:21Z) - Hierarchical Interaction Networks with Rethinking Mechanism for
Document-level Sentiment Analysis [37.20068256769269]
Document-level Sentiment Analysis (DSA) is more challenging due to vague semantic links and complicate sentiment information.
We study how to effectively generate a discriminative representation with explicit subject patterns and sentiment contexts for DSA.
We design a Sentiment-based Rethinking mechanism (SR) by refining the HIN with sentiment label information to learn a more sentiment-aware document representation.
arXiv Detail & Related papers (2020-07-16T16:27:38Z) - XREF: Entity Linking for Chinese News Comments with Supplementary
Article Reference [19.811371589597382]
We study the problem of entity linking for Chinese news comments given mentions' spans.
We propose a novel model, XREF, that leverages attention mechanisms to pinpoint relevant context.
We develop a weakly supervised training scheme to utilize the large-scale unlabeled corpus.
arXiv Detail & Related papers (2020-06-24T19:42:54Z) - Commonsense Evidence Generation and Injection in Reading Comprehension [57.31927095547153]
We propose a Commonsense Evidence Generation and Injection framework in reading comprehension, named CEGI.
The framework injects two kinds of auxiliary commonsense evidence into comprehensive reading to equip the machine with the ability of rational thinking.
Experiments on the CosmosQA dataset demonstrate that the proposed CEGI model outperforms the current state-of-the-art approaches.
arXiv Detail & Related papers (2020-05-11T16:31:08Z) - Exploring Explainable Selection to Control Abstractive Summarization [51.74889133688111]
We develop a novel framework that focuses on explainability.
A novel pair-wise matrix captures the sentence interactions, centrality, and attribute scores.
A sentence-deployed attention mechanism in the abstractor ensures the final summary emphasizes the desired content.
arXiv Detail & Related papers (2020-04-24T14:39:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.