Predicting Themes within Complex Unstructured Texts: A Case Study on
Safeguarding Reports
- URL: http://arxiv.org/abs/2010.14584v3
- Date: Fri, 4 Jun 2021 17:33:10 GMT
- Title: Predicting Themes within Complex Unstructured Texts: A Case Study on
Safeguarding Reports
- Authors: Aleksandra Edwards, David Rogers, Jose Camacho-Collados, H\'el\`ene de
Ribaupierre, Alun Preece
- Abstract summary: We focus on the problem of automatically identifying the main themes in a safeguarding report using supervised classification approaches.
Our results show the potential of deep learning models to simulate subject-expert behaviour even for complex tasks with limited labelled data.
- Score: 66.39150945184683
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The task of text and sentence classification is associated with the need for
large amounts of labelled training data. The acquisition of high volumes of
labelled datasets can be expensive or unfeasible, especially for
highly-specialised domains for which documents are hard to obtain. Research on
the application of supervised classification based on small amounts of training
data is limited. In this paper, we address the combination of state-of-the-art
deep learning and classification methods and provide an insight into what
combination of methods fit the needs of small, domain-specific, and
terminologically-rich corpora. We focus on a real-world scenario related to a
collection of safeguarding reports comprising learning experiences and
reflections on tackling serious incidents involving children and vulnerable
adults. The relatively small volume of available reports and their use of
highly domain-specific terminology makes the application of automated
approaches difficult. We focus on the problem of automatically identifying the
main themes in a safeguarding report using supervised classification
approaches. Our results show the potential of deep learning models to simulate
subject-expert behaviour even for complex tasks with limited labelled data.
Related papers
- Harnessing the Power of Beta Scoring in Deep Active Learning for
Multi-Label Text Classification [6.662167018900634]
Our study introduces a novel deep active learning strategy, capitalizing on the Beta family of proper scoring rules within the Expected Loss Reduction framework.
It computes the expected increase in scores using the Beta Scoring Rules, which are then transformed into sample vector representations.
Comprehensive evaluations across both synthetic and real datasets reveal our method's capability to often outperform established acquisition techniques in multi-label text classification.
arXiv Detail & Related papers (2024-01-15T00:06:24Z) - A Multi-label Continual Learning Framework to Scale Deep Learning
Approaches for Packaging Equipment Monitoring [57.5099555438223]
We study multi-label classification in the continual scenario for the first time.
We propose an efficient approach that has a logarithmic complexity with regard to the number of tasks.
We validate our approach on a real-world multi-label Forecasting problem from the packaging industry.
arXiv Detail & Related papers (2022-08-08T15:58:39Z) - Leveraging Ensembles and Self-Supervised Learning for Fully-Unsupervised
Person Re-Identification and Text Authorship Attribution [77.85461690214551]
Learning from fully-unlabeled data is challenging in Multimedia Forensics problems, such as Person Re-Identification and Text Authorship Attribution.
Recent self-supervised learning methods have shown to be effective when dealing with fully-unlabeled data in cases where the underlying classes have significant semantic differences.
We propose a strategy to tackle Person Re-Identification and Text Authorship Attribution by enabling learning from unlabeled data even when samples from different classes are not prominently diverse.
arXiv Detail & Related papers (2022-02-07T13:08:11Z) - Learning to Detect Instance-level Salient Objects Using Complementary
Image Labels [55.049347205603304]
We present the first weakly-supervised approach to the salient instance detection problem.
We propose a novel weakly-supervised network with three branches: a Saliency Detection Branch leveraging class consistency information to locate candidate objects; a Boundary Detection Branch exploiting class discrepancy information to delineate object boundaries; and a Centroid Detection Branch using subitizing information to detect salient instance centroids.
arXiv Detail & Related papers (2021-11-19T10:15:22Z) - Classification of Consumer Belief Statements From Social Media [0.0]
We study how complex expert annotations can be leveraged successfully for classification.
We find that automated class abstraction approaches perform remarkably well against domain expert baseline on text classification tasks.
arXiv Detail & Related papers (2021-06-29T15:25:33Z) - Self-Training with Weak Supervision [32.68342091430266]
State-of-the-art deep neural networks require large-scale labeled training data that is often expensive to obtain or not available for many tasks.
weak supervision in the form of domain-specific rules has been shown to be useful in such settings.
We develop a weak supervision framework (ASTRA) that leverages all the available data for a given task.
arXiv Detail & Related papers (2021-04-12T14:45:04Z) - Streaming Self-Training via Domain-Agnostic Unlabeled Images [62.57647373581592]
We present streaming self-training (SST) that aims to democratize the process of learning visual recognition models.
Key to SST are two crucial observations: (1) domain-agnostic unlabeled images enable us to learn better models with a few labeled examples without any additional knowledge or supervision; and (2) learning is a continuous process and can be done by constructing a schedule of learning updates.
arXiv Detail & Related papers (2021-04-07T17:58:39Z) - Weakly-supervised Salient Instance Detection [65.0408760733005]
We present the first weakly-supervised approach to the salient instance detection problem.
We propose a novel weakly-supervised network with three branches: a Saliency Detection Branch leveraging class consistency information to locate candidate objects; a Boundary Detection Branch exploiting class discrepancy information to delineate object boundaries; and a Centroid Detection Branch using subitizing information to detect salient instance centroids.
arXiv Detail & Related papers (2020-09-29T09:47:23Z) - Unsupervised and Interpretable Domain Adaptation to Rapidly Filter
Tweets for Emergency Services [18.57009530004948]
We present a novel method to classify relevant tweets during an ongoing crisis using the publicly available dataset of TREC incident streams.
We use dedicated attention layers for each task to provide model interpretability; critical for real-word applications.
We show a practical implication of our work by providing a use-case for the COVID-19 pandemic.
arXiv Detail & Related papers (2020-03-04T06:40:14Z) - Learning Cross-domain Generalizable Features by Representation
Disentanglement [11.74643883335152]
Deep learning models exhibit limited generalizability across different domains.
We propose Mutual-Information-based Disentangled Neural Networks (MIDNet) to extract generalizable features that enable transferring knowledge to unseen categorical features in target domains.
We demonstrate our method on handwritten digits datasets and a fetal ultrasound dataset for image classification tasks.
arXiv Detail & Related papers (2020-02-29T17:53:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.